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Abstract 

The dataflow model of computation exposes and exploits parallelism in programs without 
requiring programmer annotation; however, instruction-level dataflow is too fine-grained to be 
efficient on general-purpose processors. A popidax solution is to develop a “hybrid” model of 
computation where regions of dataflow graphs are combined into sequential blocks of code. I 
have implemented such a system to allow the J-Machine to run Id programs, leaving exposed 
a high amovmt of parallelism — such as among loop iterations. I describe this system and 
provide an analysis of its strengths and weaknesses and those of the J-Machine, along with 
ideas for improvement. 
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Chapter 1 


Introduction 


If you can look into the seeds of time, 

And say which grain will grow and which will not, 

Speak. 

— William Shakespeare, Macbeth, Act I, Scene iii, line 58. 

This thesis describes a system I designed amd implemented to allow programs written 
in the dataflow l^lnguage Id to rim on the J-Machine, a massively-paraUel general-purpose 
computer. The system is functioned and includes: 

• A compiler that recognizes a signiflccint portion of Id and produces J-Machine assembly 
code. 

• Library routines to provide operating system functions, fault handlers, aind language- 
specifle features like I-structure storage. 

• A strategy for aggressive loop parallelization. 

I do not directly address the question of how to sequentialize portions of dataflow graphs. For 
this, I took adveintage of the work done by Ken Traub on program partitioning [Traub 1988] 
and Robert leinnucci for his “dataflow / von Neumann hybrid” architecture eind compiler 
[lannucci 1988]. With some optimizations, my system simulates lannucci’s hybrid architec¬ 
ture on the J-Machine. In this document, I describe and justify my approach, detiiil my 
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treinsformations, analyze the restdts, and present my conclusions about the project and future 
research on datciflow computation for the J-Machine. 


1.1 Background 

A large amount of research has gone into developing and implementing the dataflow model 
of parcJlel computation. In order to exploit the parallelism revealed by dataflow techniques, 
special-purpose dataflow machines have been built that are unlike traditional von Neumann 
processors, using parallel machine languages and having token and I-structure memory. Be¬ 
cause individual instructions are scheduled dynaimiceilly on dataflow processors, this leads 
to unnecesscirily high run-time overhead. On the other hand, dataflow £irchitectures, with 
their per instruction synchronization, are more tolerant than von Neumemn machines at tol¬ 
erating latency: If the data dependences allow some computation to be performed while the 
previously-executing task is waiting for data, the processor will be kept busy. The motivation 
for a hybrid circhitecture is to combine the latency toleration of a dataflow processor with the 
efficiency of a von Neumann processor. Often, enough is known at compile-time to specify a 
full ordering of a set of instructions, reducing the amount of rim-time scheduling necesseiry. 
Hybrid architectures attempt to take advantage of this knowledge by delineating sequences of 
instructions whose order can be pre-determined, combining the exposed parallelism of dateiflow 
with the efficiency of von Neumann computation.^ 

While combining instructions into sequenticd threads theoretically lessens the amount of 
run-time parallelism avaiilahle, it can be more practical in that it minimizes scheduling over¬ 
head and allows the code to rim on computers not dedicated to dataflow processing. Ad¬ 
ditionally, even dat5iflow computers do not attempt to exploit the maximum possible par¬ 
allelism. For exiimple, on Monsoon, a specific invocation of a procedure is generally not 
divided aimong processors but takes place on a single one. Instead, the parallelism comes 
from pipelining and from running iterations of one loop concurrently on separate processors 
[Papadopoulos and Culler 1990], a feature that is retained by hybrid architectures. In order 
to ensure that grouping instructions into threads does not lessen the ability to tolerate latency, 

*This justification of hybrid architectures based on latency toleration is due to ideas in [lannucci 1988, 
Chapters 1 and 2], 
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we obey “lannucci’s Injunction” that instructions within a thread may not have tmbounded 
latency. Instructions with imbounded latency — such as procedure calls and globed memory 
accesses — cause a thread to suspend, allowing another to execute. 

My work includes a compiler back-end to edlow dataflow programs to rim on the J-Machine, 
a general-purpose massively-peirallel computer. Although closer to the von Neumeinn model 
them datedlow architectures, the J-Machine has many of the necessary communication and 
naming primitives needed for dateiflow computation. I built my back-end on top of the Id 
compiler developed by the Computation Structures Group at the MIT Laboratory for Com¬ 
puter Science [Traub 1986a], as augmented by Robert Isinnucci to produce code for his hybrid 
architecture [I^lnnucci 1988]. My system trainsforms his hybrid code to run on the J-Machine. 

1.1.1 Id 

Id is a primarily ftmctionad language developed in the Computation Structures Group of the 
MIT Laboratory for Computer Science for progreimming datciflow amd other parallel comput¬ 
ers. [Nikhil 1988] is a reference for the latest version. All of its features Me supported by my 
tremsformations, except for eJgebraic types, as they postdate lannucci’s compiler on which 
mine is based. A quick overview of pertinent features of the language is presented here. 

Types 

The only primitive types in Id are booleains, characters, numbers, 
symbols.^ Additionally, there are four pre-defined type constructors 
types and create new types: 

• Mray types: (ID^ray t), (2D_array t), ... 

• list types: (list f) 

• tuple types: (to ,..., t„) 

• function types: (to ^ ^i) 

*In the latest version of Id, booleans are not primitive but are defined with algebraic types, which we were 
unable to support, as described above. 


character strings, and 
that take one or more 
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Id is strongly-typed in that extensive compile-time and run-time type-checking is per¬ 
formed, but users rarely explicitly provide type information. Additionedly, Id allows polymor¬ 
phism. 

Function Application 

The application of function / with arguments oi, ...,On is written: 

f ai,..(ln 

Id also supports currying: If function / “expects” two arguments, fa, instead of being illeged 
as in most Ifinguages, returns a function that t 2 Lkes one argument. For example, if plus is 
defined as a function that teikes two numbers and adds them, plus 3 returns a function that 
takes one number as £in eirgument aind adds 3 to it. As wiU be seen later, currying causes 
additional overhead in run-time procedure linkage. 

I-Structures 

One major ^lrgument against purely functioned leinguages is their suboptimed efficiency with 
arrays. Specifically, it is unnecessarily wasteful to copy ein entire eirray when modifying one 
element. Filling in the n elements of a previously-empty cirray cein teike 0{n^) time eind space, 
as the entire array is recopied when each element is written. This problem was partially 
solved with I-structures, arrays with elements that can oidy be written to once. After being 
written to, reads take place as expected; subsequent writes are a nm-time error. Because no 
copying is done, filling an array of I-structures takes 0(n) time. If a read teikes place before 
a write, the read is silently deferred until the data is avedlable. This process is illustrated in 
Figure 1-1. Out-of-boimd accesses to I-structures cause run-time errors. The properties of 
I-structures guar 2 mtee deterministic behavior in legal progr£ims^. While keeping Id from being 
purely functional, they greatly improve its efficiency without homing abstraction. Tuples and 
eirrays, described above, eire implemented as I-structures. 

In addition to supporting user types, I-structures eire used to create closures for currying 

*Hete and elsewhere, a legal program is one in which no compile-time or run-time errors occur. 
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write 



read 


read! ( wait 


Figure 1-1: A FSM Description of £in I-structure Location. Originally, 2 in I-structure location 
is empty. Reads axe silently deferred until data has mrived. Once data has been written, 
pending cind subsequent read requests can be fulfilled. Writing a location more than once is 
a run-time error. --- 

procedure calls. Whenever an argument is applied to a procedure, a check is made whether 
the argument supplied is the last one. If so, the procedme is invoked; otherwise, the argument 
is added to the I-structure list of arguments and saved into a closure. 


Blocks 

Blocks in Id provide a mechanism to bind ncimes to values within the block’s body. It is 

analogous to Lisp’s let construct, except that, as in aU Id constructs, the textual order of the 

statements is ignored. A block to compute the surface area of a cylinder, given its radius r 

and height h, could be written: 

{ face = Pi * r * r; 

body =2*Pi*r*h 
in 

2 * face + body } 

Note that it is not always possible to statically determine the order in which statements 
in the “declaration” section of a block will execute. Consider the following exeimple from 
[Traub 1989, page 2]: 

■C p = X > 0; 

a = if p then bb else 3; 
b = if p then 4 else aa; 
aa = a 5; 
bb = b + 6; 
c = a + b 
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in 

c}; 

If * > 0, the only possible order of evaluation is: p, b, bb, a, aa, c. If a: < 0, the 
expressions must be evaluated in a different order: p, a, aa, b, c. This provides an example of 
em Id fragment in which the order of execution of statements cannot be determined at compUe- 
time. This provides a theoretical limit on compile-time scheduling, beyond ainy practical limits 
based on insufficiently sophisticated compilers, because no compile-time scheduling exists. 

Loops 

The format of a loop statement is: 

{for X <- elndex do 
<statement> ; 

<statement> 
finally o} 

The keyword next is provided to refer to the next value of a loop iteration. For exaimple, a 
loop to add the first n integers would be written: 

{ sum = 0 
in 

{ for count <- 1 to n do 
next sum = sum coiint 
finally sum }} 

The semantics of Id are such that it is possible for multiple iterations of a loop to execute 
in parallel. lannucci’s compiler for the hybrid architecture has loops execute in “peirjJlel” 
on a singl e processor, i.e. statements in the iteration may execute before statements in 
the iteration, as long as data dependences are respected. Inner loops eire put in separate 
codeblocks and can be spawned to separate processors. 

Many years have been spent developing and optimizing an Id compiler for the Tagged- 
Token Datciflow Architecture [Traub 1986a], a paper dataflow architecture. This compiler was 
the base of lannucci’s and of my research. 
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1.1.2 lannucci’s Hybrid Architecture 

Development of hybrid architectures is an active area of research. See [Gaudiot and Bic 1989] 
for a summciry of recent research in the area. One of the best known hybrid architectures is 
the EM-4 being developed at the Electrotechnical Laboratory in Japan [Seikcii et cd 1989]. I 
chose to base my work on lannucci’s system because of the ease with which I could access his 
compiler, developed at MIT, as well as its quality. 

lannucci’s extensions to the Id compiler maike use of information available at compile-time 
to create scheduling quanta (SQs), sequences of code within which the order is specified at 
compile-time. Invocation of a codeblock or procedure teikes place on a single processor and 
generally consists of many SQs.^ When a procedure is invoked, the instructions in the first SQ 
aire executed sequentially, suspending at the end of the SQ or if a fault occurs, signifying that 
needed data is not ready. The execution of other SQs results from explicit forks.® The length 
of scheduling quainta is limited by the level of the compiler’s analysis and by the reqmrements 
of Id. Arguments, local variables, and edl but the most ephemeral of temporaries are stored 
within a frame aillocated when the codeblock is invoked. My implementation for the J-Machine 
includes edl of these chairacteristics. Further details about lannucci’s implementation and 
jirchitecture will be provided as needed throughout the document. Henceforth, when I write 
“the hybrid architecture,” I mean to refer to lannucci’s architecture. 

1.1.3 The J-Machine 

The target of my system is the J-Machine, a massively-par£illel MIMD computer based on 
the Message-Driven Processor (MDP). Each processor has 260K (4K on chip) of 32-bit-word 
memory augmented with 4-bit tags. Tag types include booleans, integers, symbols, and cfu- 
tures. Cfutures generate faults on most operations. The MDPs communicate with each other 
through a low-latency network by sending messages. When a message arrives at a processor, 

*To be exact, it is not always true that a procedure invocation executes on a single processor. More 
precisely, a codeblock invocation executes on a single processor. A procedure is usually one codeblock, but 
there are exceptions. When interior procedures are lambda-lifted out of a procedure definition, they constitute 
separate codeblocks, as do inner loops, so that they can be spawned among processors. Occasionally in the 
document, I provide simplified explanations whose exact details are fleshed out later. 

^Throughout this document, I use “fork” to mean enabling a continuation on the current processor and 
“spawn” for enabling a continuation on another processor. 
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it is written into the message queue. When the message gets to the head of the queue, its first 
word is loaded into the instruction pointer, and a pointer to the base of the message is loaded 
into an address register so that subsequent words may be accessed. Execution continues se¬ 
quentially until an explicit suspend instruction. The first J-Machine is expected to be built 
within a year and will have thousands of processors. For my research, I used a simulator of 
a 32-node J-Machine [Horwat and Totty 1987]. See [Daily et al 1988b] for a more complete 
description of the Message-Driven Processor. 

1.2 Overview 

In Chapter 2, I provide an overview of how the code is executed on the J-Machine, describ¬ 
ing the run-time structures and control structure transformations. Chapter 3 describes my 
compiler and how it fits on top of the Id-to-hybrid compiler, as well as showing the code 
production templates. Chapter 4 provides benchmarks, including an extended example of 
the tr£insformation and execution of a simple factorial program. Chapter 5 is the conclusion, 
presenting my retrospective opinions on the project and describing ways in which it could be 
improved. The appendices include program examples eind somce code. 


8 



Chapter 2 


Executing Hybrid Code on the 
J-Machine 


The villainy you teach me I will execute, 
and it shall go hard, 
but I will better the instruction. 

— William Shakespe^lre, The Merchant of Venice, Act III, scene i, line 76. 

Because Id is designed for dataflow processors — its naime stands for Irvine Dataflow — 
its rim-time demeinds are different from those of traditioned imperative lainguages designed 
for von Ne umann processors. On dataflow architectures, such as the Tagged-Token Dataiflow 
Architecture eind Monsoon, instructions are scheduled individually as soon as the data de¬ 
pendences have been satisfied. It would not be reasonable to attempt to imitate this on a 
non-dataflow architecture: When I haind-compiled Id programs onto the J-Machine with such 
a strategy, overhead was extremely high. For a typical dataflow instruction, such as plus, with 
two sources and two sinks, 20 MDP instructions were executed [Spertus 1989]. 

One of the major goads of compiling any language is to do as much work as possible at 
compile-time, leaving a minimum of work for nm-time. Thus before running dataflow code 
on a von Nemnaim processor, the compiler shordd sequentialize sequences of instructions as 
much as possible. In [Traub 1988], a method of sequentiadizing regions of code into threads, 
or scheduling quainta (SQs), is presented. This lessens the aimount of run-time overhead 
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considerably; however, it does not reduce it to zero. Because it cannot be determined statically 
what order the SQs must run in — if it were known, the SQs would already have been combined 
— some run-time scheduling is necessary. Specifically, SQs are explicitly forked as soon as 
the necessary data might be present. They may begin executing any time thereafter. Within 
a SQ, checks are performed to see if necessary data is present. If it is not, the SQ suspends, 
to try agedn once the data is received. Run-time support is necessary for these operations. 
In this chapter, I describe the rim-time behavior of the programs at a detailed but relatively 
high level. I go into lower level detail in the following chapters. 

2.1 Overview- 

Program execution on the J-Machine is based on the same ideas as on the hybrid architecture: 
Instructions are grouped into scheduling queinta subject to the following constrcdnts: 

1. The program yields the same results as pure dataflow computation. 

2. No deadlocks are introduced. 

3. An instruction with unboimded latency must not be within a SQ. 

Because I work with the scheduling quanta produced by lannucci’s compiler, I inherit the 
assurance that the partitioning yields correct and terminating results [lannucci 1988, Chapter 
4].^ As lannucci did, I divide all imbounded-latency tasks into multiple phases so that other 
tasks cm execute between initiation and fulfillment of a request. 

When a codeblock is invoked, a contiguous region of memory called a frame is allocated 
for its argiunents and scratch variables. The freime is given a unique globed name. Because 
each invocation has its own data eirea, the same procedure cein execute multiple times on one 
processor, with execution of the invocations interleaved. After a codeblock starts executing, 
it will probably fault on a slot in its frame — i.e. it will look for a value in a specific slot of 
the frame, but the data will not be present. In this case, a continuation is created encoding 
the code address and is stored into the offending slot. When the data arrives, the data will be 

‘It is not entirely true that I use the SQ divisions unchanged. As will be discussed in the next chapter, 
there are a few cases in which I tweak SQs. 
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Frame of Ceiller 



Figure 2-1: Run-Time Data Structures. Slots 1 and 4 of the caUee’s frame fire empty, signifying 
that the corresponding data Vcdues have not arrived yet and have not been requested. The 
data for slots 0, 2, and 3 have arrived. Slot 0 points to the caller’s frfime so that the return 
value can be sent there. The data for slot 5 has not arrived. The presence of a continuation 
list indicates that instructions in the codeblock have tried to access slot 5. When the data 
arrives, the SQs indicated in the codeblock will be resteirted. 


written into the frame slot and the continuation will be re-enabled. When aU of the SQs in a 
codeblock have successfully completed and any return values have been sent to the caller, the 
frame cam be freed. These structures are shown in Figure 2-1. The following sections describe 
them in more detfiil. 


2.2 Data Structures 


2.2.1 Codeblocks 

A codeblock consists of one or more scheduling quanta stored contiguously on each processor 
on which the procedure might be invoked. Unlike [Horwat 1989], code is distributed at load¬ 
time. The format of a pointer to a codeblock is shown in Figure 2-2. A user-defined tag value, 
CB, is used to indicate a pointer to a codeblock.^ The low sixteen bits of the descriptor hold 


*In this context, “user-defined” means defined by my dataflow system, as opposed to the hardware-specified 
tag types on the MDP. The MDP has 9 pre-defined tag types and 4 user-defined types. 
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Codeblock pointer 


|5 32 3]__£L 

I CB I Local address | Frame size | 




Codeblock 

SQ1 


SQ2 

• 

• 

SQN 


Figure 2-2: A Pointer to a Codeblock. The user-defined tag CB denotes a pointer to a 
codeblock. The low sixteen bits tell how liirge a frame must be allocated for the codeblock to 
execute. The high sixteen bits teU where the codeblock can be found. 


the number of words of storage required for each invocation, and the high sixteen bits hold 
the address of the first SQ in the codeblock. 

2.2.2 The Data Stack 

Memory is allocated from a stack, initialized to nriU cfutures. A cfuture is a MDP data type 
on which most instructions fault. Thus, slots are pre-initialized to “empty”. A heap wordd 
be a more efficient representation because memory could be freed and reused, but not enough 
time was aveiilable to implement one. The three run-time data structures aiUocated from the 
stack are frames, continuations, and I-structures, described in the following sections. 

2.2.3 Frames 

For a codeblock to execute, it needs a frame, a contiguous block of storage initicdized to 
mill cfutures (i.e. to empty). A pointer to the base of a frame is ceJled a frame descriptor. 
Figure 2-3 shows a frame descriptor aind a procedure frame. A user-defined tag value, FD, 
is used to indicate a pointer to a fraime. The low sixteen bits of the descriptor hold the 
node number, aind the high sixteen bits hold the local address, combining to provide a globed 
address. Storing the node number in the low sixteen bits provides an efficiency bonus on the 
J-Machine as first described in [Horwat 1989, page 68]. 
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3532 31 16 15_ 

I FD iLocal address Node Number 


1 _M 

0: 

FD of caller 

1: 

ISD of Argument Chain 

2: 

unused 

3: 

Last Argument 

• 

• 

First Argument 


First Scratch Slot 

• 

• 

Last Scratch Slot 


Figure 2-3: A Non-Loop Procedure Frcime. A user-defined tag, FD, denotes a frame descriptor. 
It encodes the unique globed address of a frcime. The first slot of a frame holds a frame 
descriptor indicating where to send return values. The next slot holds the address of the I- 
structure chain of arguments. In some cases, the arguments can be passed directly in jirgument 
slots. The remaining slots are used for scratch values during the procedure’s execution. 


Slot 0 of the freime holds a frame descriptor telling where to send 2 iny return values. 
Some subtleties are involved in whether the cirguments are passed in argument slots or as 
an I-structure chain. I retain lannucci’s conventions, and the interested reader is referred to 
[lannucci 1988, pages 111-113]. The additional slots present in codeblocks with loops wiU be 
discussed in Section 2.3.3. Except for how I handle loops, my frames aire identical to those 
used by lainnucci. The base of the frame currently executing is always kept in MDP address 
register A2. Talking aU frame accesses relative to A2 allows multiple invocations of a procedure 
to run on the saone processor. 


2.2.4 Continuations 

When am attempt is made to read an empty frame slot (i.e. a efuture), a fault occurs whose 
handler does the following: 

1. Stores a request to restairt the SQ when the data arrives. 

2. Suspends, in order to let ainother SQ execute. 
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In producing code, I ensure that at the time of a cfuture fatdt, the MDP register RO holds a 
message indicating where execution should restart. I also take advantage of the MDP’s always 
storing the absolute address of the last memory access in the MAR register. This allows the 
fault heindler to determine which piece of data was missing. The handler adlocates a triple 
(i.e. three words) from the stack and sets them to the following: 

1. A message indicating where execution should restart (taken from RO). 

2. The base of the current frame (taken from A2). 

3. A pointer to the next continuation (if any) waiting on the faulted location. This is the 
old value of the slot. 

The address of the triple is tagged as a cfuture and is written into the data location for which 
the fault occurred.® When the data arrives, the slot is checked just before the data is written. 
For every continuation present, the indicated message is sent and the continuation freed.^ 
Because codeblocks execute within one processor, the message is sent from the processor to 
itself. J-Machine routing is done in such a manner that this is a cheap operation. Allocating 
eind filling a continuation aifter a fault takes 18 cycles. Writing to a frame slot takes 7 cycles 
if no continuations eire waiting eind 8 + 6 * u;, if continuations are waiting. 

An Alternate Method for Continuations 

I considered an alternate method of keeping track of suspended continuations. Instead of 
storing the continuation in a tuple edlocated from the stack, the system could immediately 
send the message indicating where execution should rest£urt, effectively putting it at the end 
of the local message queue. When the message reaches the head of the queue, it is tried ag^lin. 
If the data has currived, it executes successfully (or at least until the next fault); otherwise, it 
will throw itself on the queue ageun. 

This method has several advantages: 

1. It seems to fit more elegantly on top of the J-Machine, tzdcing advaintage of the message 
queue provided. 

*To be precise, a quadruple is sometimes needed instead of a triple, as will be explained in Section 2.3.3. 
^Due to the primitive memory management of my system, the locations are freed in concept only. 
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Figure 2-4: An I-Structure Descriptor and Storage. An I-structure descriptor includes its type 
and a global address that points to a block of storage, holding the bounds and the data. 

2. Message suspension executes more quickly. 

3. There is no need to check a frame location before writing a value to it. 

The disadvantages, however, are major: A SQ could restart amd fail many times, using an 
imbounded number of machine cycles. Additionally, the MDP message queue could overflow. 
For these reasons, I decided not to use this method. 

2.2.5 I-Structures 

I-structures are deflned in Section 1.1.1. To review, they are array-like data structures whose 
entries can be written once. Reads before writes £ire silently deferred. (This shows one of the 
reasons high latency toleration is necessary.) I-structures are edlocated explicitly by the user 
and implicitly for argument cheiins for procedure Ccdls. Due to time constraints, I-structures 
axe not hcuidled by my compiler; however, I did develop and test the treinslation methods that 
would be used. 

Figure 2-4 shows how I-structure descriptors and storage cire implemented. I-structure 
descriptors are built ainalogously to frame descriptors, using the user-deflned tag naime ITAG. 
The low and high boimds of the I-structure are stored at the base of the region of storage, 
after which the data appeeir sequentially. 

For a given cell of I-structure storage, there eure three possible states, corresponding to the 
non-error states in Figure 1-1. The possibilities, and how they are indicated, are: 

1. Empty, indicated by a null cfuture. 
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Figure 2-5: An I-Structure. The lower and upper bounds of this I-structure £ire 5 and 8, 
respectively. When a read or write request eirrives, a run-time error occurs if the passed-in 
offset is out of bounds. If not, the lower bound is subtracted from the passed-in offset, aind 
the corresponding cell is examined. In this exsimple, data has been written to I[5] and I[7], 
there have been no attempts to read or write I[6], and there have been two reads to I[8] that 
will be satisfied when the data arrives. Writing to a slot more than once is a nm-time error. 


2. Waiting for data, indicated by a cfuture whose value points to a local linked list of 
continuations needing the data. 

3. Full, indicated by a non-future (i.e. the data itself). 

The continuations eire of the same form as described in Section 2.2.4. An example of an 
I-structure is shown in Figure 2-5. 

Writing an element of an I-structure takes 20 -f 6 * r instructions, where r is the number 
of pending requests. The read handler taikes 19 instructions if the data is present and 30 
if it is not. These times include comparing against the bounds, subtracting off the lower 
bound, ensuring that no more than one write is done, aind allocating emy memory needed for 
continuations. 


2.3 Control Structure 

2.3.1 Execution Within a Codeblock 

To see how execution proceeds within a codeblock, let us review the example block from 
Section 1.1.1. It is reproduced in Figme 2-6. Consider the possible orders of evaluation: 

• H X > 0, 6 —> 66 ^ a —> aa —> c. 

• If X < 0, a —> aa —» 6 —> 66 —» c. 
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def abc x = 

{ p = X > 0; 

a = if p then bb else 3; 
b = if p then 4 else aa; 
aa = a + 5; 
bb * b + 6; 
c * a + b; 
in 

c}; 

Figure 2-6: A Statically Unschedulable Codeblock. It is impossible to determine tbe order in 
wbicb a, b, aa, and bb must be computed without knowing wbetber x > 0. 


Observe that in both cases, b precedes bb, a precedes aa, p is tbe first calculation, and c is tbe 
last. Using these static dependences, we partition the code into three scheduling quamta, as 
shown in Figure 2-7.® 

Let us consider the case where a: > 0. P is the first SQ to execute. As shown in Figure 2-8, 
it computes p then forks A, B, and C, in that order, and suspends. A begins, then suspends, 
because bb is needed but not available. B, next in the queue, begins and executes to completion. 
When it stores bb, it sees that A is waiting on the value and sends a message to restcirt A. C 
then begins executing and faults on a, suspending. The second attempt to execute A is now 
at the head of the message queue eind completes, sending a request to restart C. C executes, 
performing the addition and whatever else follows (such as returning the resulting value). 
The astute reader wiU have noticed that the sample procedme could be reduced to 

def abc x = 
if X > 0 then 
14 

else 

11 ; 

Despite this possible compile-time reduction, the excimple is still relevant for two reasons: 
First, the early stages of the compiler are not sophisticated enough to perform the reduction; 
second, examples exist for which no such reduction is possible. For example, if in the original 


^Throughout the text, partitions are simplified to provide a more intuitive understanding than would be 
gained by going into the exact details on how a SQ is produced. 
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Figure 2-7: Schedtiling Quainta for Unschedulable Example: The code in Figure 2-6 is divided 
into four scheduling quanta. The calculations for b and bb appear in the same qujintum because 
bb depends only on b. It is impossible to determine statically whether SQ A or B executes 
first. Arrows indicate that one SQ forks another. 
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Figure 2-8: Snapshots for Codeblock Exzimple. This shows snapshots of the message queue 
and frame before each SQ for the program in Figure 2-6. 
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Figure 2-9: Procedure Linkage Example. Processor A requests a context on processor B. As 
soon as the frame is aillocated, execution of the procedure CciU begins on B. When A receives 
the context vadue, it can send the argument(s), after which B can complete. Shaded rectangles 
indicate time that cotild be spent on other tasks. Note that those tasks are not interrupted 
when data arrives. 


progr£im (Figure 2-6), the bindings for a and b were changed to a = f x bb cind b = g x 
aa, where f and g are passed in as parameters, no compile-time reductions wotdd be possible 
[Traub 1989, page 2]. 


2.3.2 Procedure Calls 

Figme 2-9 shows how procedure linkage is done without tying up either processor. When 
processor A Wcints to call a procediire on processor B, A must cJlocate a context (frame) on 
B for the codeblock’s arguments and scratch area. Allocating a context has the side effect of 
stcirting execution of the first SQ in the procedure. After the address of the frame is retmned 
to A, it sends the arguments to B, which will have faulted if the data was already needed. 
When the data arrives on B, suspended SQs are restarted. After B completes, it sends the 
retmn value (if any) and a signal to A, and it frees its frame. Note that other processes can 
execute while A and B are waiting for data. 

While it would be more efficient in most cases for a caller to be able to send cirguments at 
the same time as requesting the context, there was no derm way to do this. An interesting 
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effect of this policy is that (as in other Id implementations) a procedure can conceivably 
do substantied cedctdation or even return a value before receiving any arguments! This is 
necessary because procedure calls in Id cire non-strict. 

Currently, the system does not do any load-balamcing, 2 ind it always spawns procedures 
to the same processor. The user must adjust the compiled code to provide a distribution 
appropriate to the problem. 

2.3.3 Loops 

As in aU other implementations of Id, I provide a way for different iterations of a given loop 
to execute at a time. Because iterations of £in outer loop execute on the same processor, they 
do not execute concurrently; instead, the SQs of up to K iterations of a loop are enabled at 
a time, where K is the loop-unfolding constant. When a calculation within one iteration is 
waiting for something, such as the result of a procedure ceiU to another processor, instructions 
from other iterations may execute, subject to data dependences. Because up to K iterations 
may execute at once, there must be K places to store each intermediate value, so this method 
requires allocating K iteration areas. In [lannucci 1988, Section 4.3.5], I^lnnucci presents and 
proves the correctness of a method for dynamically unfolding loops which guaraintees the saime 
residts as sequential execution. I use his method, although I implement it differently. 

Concepts 

In lannucci’s method, an iteration includes the evaluation of the predicate and subsequent 
execution of either the loop body or the loop termination code. He observes that for iteration 
i to begin, three conditions must hold: 

1. The predicate for iteration (z — 1) has been evaduated to “true”. 

2. The (z — Ky^ iteration has terminated, allowing us to reuse its iteration area. 

3. The {i + 1 — Ky^ iteration must have eJready consumed its loop vairiables. 

The third condition is the most subtle. It exists because iteration i will write the values of 
loop variables into the slots of iteration i 1. Hence, iteration z cannot execute until iteration 
z' -h 1 — -K” is done with the values currently stored in these slots. 
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Figure 2-10: Possible Implementation of an Iteration Descriptor. The iteration fields hold the 
offsets from the frame base of the next, current, and previous iteration areas. The Import and 
PC flags teU whether this iteration may begin. Bits 26 through 31 are unused. This format 
was not used. 


These rules are enforced with two flags, PC and import. Iteration i’s PC flag is set when 
the first condition, that the predicate for iteration i — 1 is true, has been established. The 
import flag is based on condition three; it is set when the next iteration etrea is ready to 
import new loop veiriables. In [lannucci 1988, pages 129-131], lannucci proves that the rules 
for the two flags cover sdl three conditions. When both of an iteration’s flags are true, its first 
SQ (presumably to compute the predicate) may be enabled. 

Implementation 

Icinnucci’s hybrid sirchitecture supports loops with several special-purpose instructions aind 
hsirdw 2 ire support. Specifically, iteration descriptors, containing the two flags and pointers 
to the previous, current, and next iteration areas, c£in be stored in one machine word. As 
Figure 2-10 shows, it was possible to store all these qu£intities into the MDP’s shorter (32-bit) 
words, but, lacking hardware support for accessing these fields, shifting and masking were too 
slow. Additionally, in the small amount of space available for each iteration pointer, it was 
only possible to store offsets relative to the current frame, not absolute addresses, which would 
be more convenient. Hence, I decided not to mimic the hybrid architecture’s implementation, 
and I developed my own data structmes. 

Figme 2-11 shows a fraime for a procedure with a loop. In addition to the slots fotmd in 
non-loop frames (see Figure 2-3), it has slots for the loop-unfolding constant, loop constants. 
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FD of caller _ 

ISP of Argument Chain 
K (loop unfolding constant) 
Last Argument 

Rrst Argument _ 

First Loop Constant 

Last Loop Constant _ 

Rrst Scratch Slot 

Last Scratch Slot _ 

Pointer to Iter Area -1 

Pointer*to Iter Area K 
Area for Iters 0 mod K 

Area for Iters K-1 mod K 


Figure 2-11; A Loop Procedure Frame. Loop procedure freimes have several sets of slots in 
addition to those present in non-loop frames. Slot 2 holds K, the loop-unfolding constant. K 
specifies how many iterations may be unroUed. There is space for loop constants, values that 
could be hoisted out of the procedure’s loop. Iteration Meas are used for circulating veiriables 
and each iteration’s temporaries. The pointers allow quick access to each iteration eirea. 


iterations areas, emd pointers to the iteration areas. Each iteration eirea’s flags are stored 
within its pointer. The pointers to iteration cireas can be viewed in a more conceptual way in 
Figure 2-12. In order to support iterations, cin additioned piece of data, an iteration number 
between 0 aind K — 1 must be included in every continuation. When a loop SQ begins, the 
iteration ntunber is used to And the pointer to the current iteration area. This pointer is 
stored in MDP address register Al. Slots relative to the current iteration area can then be 
indexed off Al. If it is necesseury to access a slot in the previous or next iteration’s area, the 
iteration niimber is decremented or incremented to find the appropriate pointer from the table 
of pointers within the frsime. This is why there eire K + 2 pointers to the K areas; i.e., if 
iteration 0 is active and wants to set the previous iteration’s import flag, the pointer can be 
retrieved without providing a special check for the boundary condition. The import aind PC 
flags are stored within the pointers. 

As an example, consider the procedure in Figure 2-13 to siun the results of a function 
ev£iluated on the first n positive integers. The circulating loop variables are count eind total. 
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Figure 2-12: Iteration Areas and Pointers. Pointers to the iteration areas are stored con¬ 
tiguously from a known offset within the freime. Having K + 2 pointers to the K iteration 
areas is £in optimization: If the current iteration number is 0 and the need arises to access the 
previous iteration area, the pointer can be foimd in a straightforwaird manner, i.e. by looking 
one slot e^lier them the pointer to the current iteration area. This eliminates costly boundary 
condition checks. The PC eind import flags, not shown, are packed into the high bits 


def combine n f = 

{ total = 0 
in 

{ for count <- 1 to n do 

next total = (f count) + total 
finally total }} 


Figure 2-13: Loop Program Example. Procedure combine applies function / to the first n 
positive integers, siunming the results. For example, (combine 10 square) would return the 
sum of the squeires of the numbers from 1 to 10. 
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1. Initialize the K iteration pointers. 

2. Set the import flag of each iteration area. 

3. Set count to 1 and total to 0 in iteration area zero emd reset Mea K — I’s import flag to 
ensure that area zero gets to read count and total before they are written over. 

4. Set area zero’s PC flag, which will enable it, as the import flag is already set. 

5. For each enabled iteration, 

(a) Compare count to n. 

(b) If count < n then 

i. Write count +1 into the first slot of the next iteration area and set its PC flag. 

ii. Spawn (f count). 

iii. Add the result of the previous step to total, writing the result to the total slot 
in the next iteration area. 

iv. Now done with adl incoming circulating veiriables, set the previous iteration 
area’s import flag. 

(c) If count > n then write the current value of total to a frame slot outside the iteration 
Jireas. 

6. Once the final result has been written to the outside freime slot designated for the finally 
V 2 Jue, pass it up to the caller. 


Figure 2-14: Pseudo-Code Produced for Loop Example 


Pseudo-code corresponding to the code that would be produced is shown in Figure 2-14. 
Figure 2-15 illustrates how this scheme reveeds possible pareJlelism. Up to K invocations of 
/ win execute at once. If / is slow, this is a big win. 

The reader will observe that this scheme does not address nested loops. Those are pulled 
out of procedures at compile-time and form new codeblocks that will be called by the original 
procedure. Thus inner loops can execute in peirallel on septate processors. 

Because of a bug in the Id compiler’s interaction with I^lnnucci’s code, I was unable to 
have my compiler support loops. (The version of the Id compiler currently used is different 
from the one lamnucci wrote his system to interface with.) For my reseMch, I heind-compiled 
loop procedures to explore the different methods of implementation. 
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Figure 2-15: Snapshots for Loop Exeimple. The snapshots show how the contents of the first 
three iteration eireas for the program in Figure 2-13 chcinge over time. The first snapshot shows 
the values in the iteration areas eifter they are initi^llized. The only non-empty locations are 
the initial values for count £ind total in iteration area 0, which has been enabled, as indicated 
by the darkened border. In the second snapshot, the first iteration has tested the predicate, 
written an incremented count into the next iteration area, and has made the function call. In 
the third snapshot, the second iteration does the Scune. Note that the function calls execute 
in p£iraillel. 
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2.4 Conclusion 


Conventions were found to allow Id code to run on the J-Machine in the same style used by 
leinnucci on the hybrid airchitecture. The benefits of this strategy Me: 

1. Frames eiUow dynamic dataflow, i.e. every invocation has its own data area. 

2. SQs reduce the amount of necessary run-time scheduling. 

3. Using multiple phases for instructions with tmboimded latency frees the processor for 
useful work. 

4. Loop unrolling exposes 2 ind exploits parallelism. 

These powerful techniques are supported at rim-time by special data structmes, faidt handlers, 
and librMy routines. The next chapter describes the compile-time work necessMy to convert 
from hybrid format to MDP format. 
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Chapter 3 


Compilation 


I have heard of your paintings too, well enough; 

God has given you one face, 
and you make yourselves another. 
You jig, you amble, you lisp... 
— William Sh£ikespecire, Hamlet, Act DI, Scene i, line 150. 


Because the MDP architecture is so different from the hybrid architecture, suhsteintial 
work must be done to create MDP code from hybrid code. Keeping with the philosophy of 
the origined ID compiler, described below, I perform my transformations in several stages. 
The intermediate forms my compiler recognizes or produces are: 

• Hybrid code. 

• Complex MDP code, machine instructions whose opcodes are the same as those on the 
MDP (with a few extensions) but whose addressing modes, etc., are not legal. 

• Simple MDP code, s-expressions of legal MDP instructions. 

• MDP assembly code. 

My back-end converts from the first form to the last. The rest of the chapter describes this 
process. 
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Figure 3-1: Structure of the Id-to-MDP Compiler: Plmn roman text indicates modules of the 
original Id-to-hybrid compiler, itadics indicate modules I changed, and bold indicates modides 
I added. Program graphs Me a form of dataiflow graph. This picture is modeled after one in 
[lannucci 1988, page 97]. 


The original Id compiler is written in Common Lisp cind is based on the Dataflow Compiler 
Substrate [Traub 1986b], a set of abstractions for building modulair compilers. Each module 
inputs and outputs a stream of Lisp objects (except for the first and last modules which only 
emit or collect, respectively). Figure 3-1 shows how my modules fit on top of the Id compiler. 
Figure 3-2 shows the formats of instructions flowing through aJl of the new or changed stages. 
They will be explained in more detail below. The appendices contain complete listings of the 
files I created. 
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Dataflow Graph Nodes 


lannucci's internal hybrid format 


A stream of hybrid instructions 


Instructions with MDP operators (or one of a 
few pseudo-ops) but iiiegal operands 


Legai MDP instructions in s-expression form 


Legai MDP assembly code 


Figure 3-2: New and Modified Compiler Stages: Dataflow code flows through several stages 
in order to become MDP assembly code. The term “VND” is used to distinguish lannucci’s 
interned representation of code from my “hybrid” format. The ellipses between the first two 
stages indicate that other stages go between them. 
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3.1 Changes to Machine Code Generation 


The machine code generation module, called generate-vnd-instructions and written by lan- 
nucci, takes program graph instructions eind converts them to hybrid instructions. In some 
cases, such as for arithmetic instructions, the transformation is trivial. For conditionals, loops, 
and procedure calls, however, a single program graph instruction expands into many hybrid 
instructions. Because my control structure treinsformations for loops and procedure linkage 
differ from lannucci’s, I wrote a file changes.lisp that replaced his templates for loops eind 
procedure calls with my own. 

3.1.1 Loops 

Originedly, for the loop program graph instruction, instructions were generated to support the 
hybrid architecture’s implementation of loops. Section 2.3.3 describes how my implementation 
differs. I emit different hybrid instructions for the loop set-up instruction to initialize the 
iteration area pointers. Code within loop SQs is passed through imchainged, to be converted 
in later stages of the compiler, as only structured chcinges eire made in this module. 

3.1.2 Procedure Calls 

Section 2.3.2 described my multi-phase convention for procedure linkage, but it glossed over 
a few details. Specifically, my implementation differs from the hybrid one in an important 
way: On the hybrid circhitecture, the get-context instruction calls a local manager that selects 
a frame on another processor where the procedure can be spawned [lannucci 1988, page 174]. 
This requires a processor to know memory usage on other processors. When designing the 
system for the J-Machine, I decided each processor should know as little as possible about the 
other processors, p£irticularly because the J-Machine is massively pairedlel. One consequence 
was that I rejected this scheme. Instead, I changed the protocol so that get-context is a two- 
phase instruction, where the cedling node. A, asks the cedled node, B, for a frjime address. 
The complete calling protocol is: 

1. Execute a get-context instruction on A. This sends a request to processor B to allocate a 
frame and start execution of the appropriate procedure, and to send the frame descriptor 
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F back to processor A. 


2. Compute the return location for the procedure cedi (an offset into the current freime) 
and send it to B, attached to F. Because F is the frame descriptor, B will know where 
to put the return location. 

3. Send each of the arguments to B, attached to F. 

If get-context were merely local, no data faults would occur dming the first three steps; hence, 
the value for the return location could be written into a register instead of a more permanent 
place like a frame slot. In my strategy, a fault will occur during step 2 because F is not locally 
avciilable yet. Hence, I must insert a suspensive check for F before the second step. This way, 
it will be safe to store the return location into a register. There will be no danger that a fatilt 
will occur on F between the time the register is written eind when the register is accessed to 
send its value to B. (The values in registers are not guaranteed between suspensions, and it 
would have been too difficult for me to change the hybrid compiler’s frame allocation.) 

Even this is not the whole story. Consider a doubly-recursive procedure like a naive 
implementation of Fibonacci. Figure 3-3 shows the code that would be produced by the J- 
Machine strategy just described. The problem with this code is that the second get-context 
request would not be made until after the first one returns. This introduces unnecessary 
dependences, as it implies that steps 5-8 in the figure cainnot occur imtil steps 1-4 are finished. 
This was not a problem on the hybrid architectTire, where it was known that steps 1-4 would 
not suspend. Because step 2 will suspend, steps 5-8 will be delayed mmecesseirily. This is 
illustrated in Figure 3-4. The eirrow indicates the short-cut that exists: The second request 
can be st£irted immediately after the first. Hence, before the suspensive check, we add an 
instruction to fork a continuation corresponding to whatever follows the procedure call — 
essentially splitting the SQ. 


3.2 Assembling Hybrid Code 

The last stage of lemnucci’s compiler is an assembler that converts his internal representation 
of hybrid code into one suitable for his interpreter. I modified this stage to produce a strejim 
of hybrid instructions suitable for my stages. 
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1. Execute get-context for the first recursive call. The value for the frame FI will be 
returned at some unknown time. 

2. Make a suspensive reference to FI, so that we C£in’t get to the next step unless it has 
arrived. 

3. Compute the return location for the first procedure call eind send it to B\ attached to 
FI. 

4. Send the arguments to B1 attached to FI. 

5. Execute get-context for the second recursive cjdl. The value for the frame F2 will be 
returned at some unknown time. 

6. Make a suspensive reference to F2, so that we cein’t get to the next step unless it has 
arrived. 

7. Compute the return location for the second procedure call amd send it to B2 attached 
to F2. 

8. Send the arguments to B2 attached to F2. 


Figure 3-3: A Non-Optimal J-Machine Calling Convention. B1 and B2 represent the two 
processors on which the subprocedures are spawned. The code is non-optimid, because F2 
would not be requested until eifter FI had been received. 
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Figure 3-4: The Ordering Specified by Successive Function Calls. Unless the first instruction 
explicitly forks the second request, as show by the eirrow, code will execute sequentially as 
indicated by the plain lines. This unnecessarily lessens the amount of exploited parallelism. 

3.3 Convert Hybrid to Complex J 

“Complex J” code is an intermediate format that is relatively easy to produce from hybrid 
code. The steps for converting ein instruction are: 

1. If einy operand is suspensive, 

(a) Emit: (suspensive-instruction) 

(b) For every possibly-suspensive operand s, emit: (suspensive-opersuid s) 

(c) Emit: (suspensive-check-done) 

2. Convert £ill references to hybrid genered-purpose registers to references to temporary 
storage on the MDP. 

3. Emit code specified by the template corresponding to the hybrid instruction. 

Below, I describe the different templates for classes of hybrid instructions, in order to provide 
a deeper underst£inding of the hybrid instruction set as well as of the transformation process. 
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In this section, I go into considerable detail. Readers eire prewarned, lest they fall off the 
bottom of this depth-first search. Casu£d readers may wish to read the first few templates 
and then skip to the conclusion of this section on page 39. 

3.3.1 Label Instruction 

The template for converting a label instruction is: 

(defconversion label :label (label-name) 

'((label ,label-neuae) 

(move (:message (:base 1)) (:j-register A2)))) 

The first line generates a MDP label with the same naune as the hybrid label. The second line 
says to move the value at offset one from the current message, i.e. the fr£ime address, into 
MDP address register A2.^ That line is there because execution can begin at any label, and 
A2 is always asstuned to hold the base of frame pointer. 

This exeunple illustrates one of the differences between complex and simple MDP code: 
On the J-Machine, one of the operands of a move must be a generail-purpose register. The 
above move will be broken into two moves in the next stage, convert-cj-to-sj. At this stage, 
we do not have to concern ourselves with such deteiils. 

3.3.2 Simple Arithmetic Instructions 

The template for converting cin arithmetic instruction such as add is: 

(defconversion j-add :+ (si s2 d) 

(append (lookup-into d) 

‘((add ,sl ,s2 ,d)))) 

The lookup-into routine generates code to restMt any continuations waiting for a value to 
be written to location d, as described in Section 2.2.4. First, the slot number is copied into 
Rl, then the libr£iry routine lookup-vector is called.^ Figure 3-5 shows the conversion of an 
addition instruction. 

‘In the hybrid and MDP assembly formats, (move 1 B) moves the contents of 1 into B, not vice versa. 

*In retrospect, explicitly mentioning the register to pass the argument in at this stage is an unnecessary 
violation of abstraction. 
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(:aclcl (:frame (:base 6) :suspensive) 
(iliteral (linteger 1)) 

(:frame (:base 7))) 

(suspensive-instruction) 
(suspensive-operand (:frame (:base 6))) 
(suspensive-check-done) 

(move 7 (:j-register R1)) 

(call lookup-vector) 

(add (rframe (:base 6)) 

(literal (:integer 1)) 

(;frame (;base 7))) 


Figure 3-5: The Hybrid-to-Complex-J Conversion of am Addition. Execution will only get 
past the suspensive-operand virtual instruction if slot 6 of the current frame is present. 


3.3.3 Complicated Arithmetic Instructions 

Some arithmetic instructions €ire more complicated, such as afcs, min, and max, because they 
are mac hi ne instructions on the hybrid eirchitecture but not on the J-Machine. Thus they have 
larger templates that use temporary registers. Figure 3-6 shows the template for abs. The 
reserve and free pseudo-ops teU the next stage of the compiler where MDP registers should 
be allocated. Without this facility, the conversion of templates requiring temporairy storage 
would be much less efficient. They will be discussed in more detedl in the section on the next 
stage of the compiler. 

3.3.4 Move Instructions 

The template for converting a move instruction is: 

(defconversion move :move (source dest) 

(append (lookup-into dest) 

'((move ,source ,dest)))) 

If the destination is a frame slot, this generates code to resteirt any continuations waiting on 


35 




(defconversion j-abs :abs (s d) 

(append (lookup-into d) 

'((reserve (:register scratchl)) 

(reserve (:register scratch2)) 

(ash ,8 -31 (rregister scratchl)) 

(xor ,s (:register scratchl) (:register scratch2)) 
(sub (:register scratch2) (:register scratchl) ,d) 
(free (:register scratchl)) 

(free (:register scratch2))))) 


Figure 3-6: The Template for Converting Absolute Value. Two scratch registers must be 
reserved for the optimal absolute vedue strategy. They are used for temporairy values and are 
freed at the end of the template. The reserve and free are instructions to later stages of the 
compiler and do not directly produce ^lny code. 


the value and then performs the move. 

The move-remote instruction moves a value into a slot of another freime. Its template is: 


(defconversion movr :move-remote (frame-ptr offset value) 


‘((sendO ,frame-ptr) 

(sendO (:ref local.movr)) 
(sendO ,frame-ptr) 

(sendO .offset) 

(sendeO .value))) 


Node number 
HSG word 

First argument: frame descriptor 
Second argument: offset within frame 
Third argument: value to write 


On the J-Machine, the first word of a send sequence is a number specifying the destination 
node. The second word, the message header, specifies both how long the message is and the 
address of the handler to receive it. The meaning of subsequent words is determined by the 
h 2 indler. 

To tmderstemd the above template, rec^lll from Section 2.2.3 that the node number is 
stored in the low sixteen bits of the frame descriptor. Because the router only looks at the 
low sixteen bits, sending the frame descriptor specifies the correct destination node. When 
the message reaches that node, execution will begin at the locaLmovr library routine, which 
writes the passed Vcilue into the specified slot after checking if any continuations are waiting. 
The move-remote instruction is typically used for passing arguments and return values. 
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(:test-1 (:frame (:base 6) :suspensive) 
(:frame (:base 8))) 


O 


(suspensive-instruction) 
(suspensive-operand (:frame (;base 6))) 
(suspensive-check-done) 

(move 8 (:j-register R1)) 

(call lookup-vector) 

(move true (:frame (;base 8))) 


Figure 3-7: The Hybrid-to-Complex-J Conversion of a Test-1. Despite the template’s appar¬ 
ently ignoring the source, the instruction is converted correctly. Before the template is even 
considered, code is emitted to check for the suspensive operemd. 


3.3.5 Test Instructions 

The hybrid architecture includes the test-1 and test-2 instructions to write true into the 

destination if the source(s) are present. Execution should suspend if any source is unavailable. 

The template for test-1 is simply: 

(defconversion tstl :test-1 (si dest) 

(append (lookup-into dest) 

‘((move (:tagged-literal .boolean-tag 1) .dest)))) 

The transformation for test-2 is identical. The simplicity lies in how the converter handles 
suspensive arguments: Before the template stage is even reached, code will have been emitted 
to check suspensive operands and to suspend if they are not present. Figure 3-7 shows the 
conversion of a test-1 instruction. 

3.3.6 Continuation Instructions 

Two hybrid instructions exist to fork continuations. They cire used to start SQs within a 

codeblock. The template for the continue instruction is: 

(defconversion cntn :continue (cont) 

‘((sendO (:j-register HHR)) 

; Convert it from (:literal (:symbol :SQ-1)) to (:ref :SQ-1) 

(sendO (:ref .(second (second cont)))) 

(sendeO (:j-register A2)))) 

This sends a message from a processor to itself (the NNR register holds a processor’s own 
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node number), along with the specified SQ base and the current frame pointer, kept in A2. 

While the continue instruction is sufficient, it is non-optimal, in that the new continuation 
is likely to immediately suspend on the first value it checks for. With this observation, 
lannucci designed the continue-test instruction which tests the first slot accessed by the new 
SQ. It the vjilue is there, the continuation is forked as above; otherwise, a local continuation 
is immediately created cind stored in the appropriate slot. This saves a message send in the 
worst — and most common — case. The conversion template is: 

(delconversion cntt :continue-test (check-slot cont) 

; Convert it from (:literal (:symbol :SQ-1)) to (:ref :SQ-1) 

‘((move (:ref .(second (second cont))) (:j-register RO)) 

(move (:literal ,(frame-base-offset check-slot)) (:j-rogister Rl)) 

(call (:literal ,cntt-vector)))) 

This cadis a local librairy routine, cntt, that does the check and, depending on whether or not 
the data is present, either sends the message or stores the continuation. The cntt routine 
expects RO to hold the SQ address and Rl to hold the number of the needed slot. 

3.3.7 Procedure Linkage Instructions 

The procedure linkage convention was described in great detail in Sections 2.3.2 and 3.1.2. 
Briefly, there are three steps to spawning a procedure: 

1. Initiate a get-context request, sending the codeblock descriptor aind the address of where 
to write the new context pointer. 

2. Use index-current-context to create a new global address for return values to be sent to. 
For example, if the first return value shotdd be sent to slot 8, index the current context 
by 8. 

3. Perform remote moves to transfer the indexed context and the arguments into the newly- 
cillocated frame. 

The third step uses the move-remote instruction described eairlier. The transformations for 
get-context £ind index-current-context for the first two steps are described here. 
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Get-Context The transformation for the get-context instruction appears in Figure 3-8. 
Rather than try to explain it here, I have added det£iiled comments to the code. As mentioned 
earlier, no attempt at load balancing is made by the compiler. A library routine, get-context, 
resides on every processor to use the information sent and to perform the callee’s half of the 
protocol. 


Index-Current-Context The Index-Current-Context instruction is slightly more compli¬ 
cated. By convention, the n return vedues of a procedure aue sent to the first n slots of the 
calling freune. Because we really never Wcint the return values sent to the start of the cur¬ 
rent frame, we increment the current context and send that value to the callee instead. The 
template is shown in Figure 3-9. 


3.3.8 Conclusion 


In the convert-hybrid-to-cj stage of the compiler, hybrid instructions are transformed into com¬ 
plex J-Machine code. The transformations ignore the intricacies of MDP addressing modes, 
making the tr^lnsformation process simpler and more conceptucd. Several pseudo-operators 
for h andling suspensive instructions and register allocation are used. 
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(defconversion getc :get-context (context-slot return-slot) 

;; The first scratch register will be used to hold the global 
;; frame descriptor of the calling frame, so that the callee 
;; knows where to send the context value back to. Recall that 
;; the format of a FD is that the local address is in the high 
;; sixteen bits, and the node number is in the low sixteen. 

‘((reserve (:register scratch)) 

; Take the local address of the current frame from A2. 

(move (:j-register A2) (;register scratch)) 

; Tag it as an integer (instead of an address) so we can munge it. 

(wtag (:register scratch) 

(:literal ,int-tag) 

(:register scratch)) 

; Shift it over 16, to fit into FD format. 

(Ish (:register scratch) 

(:literal ,(- 16 *sys-len-bits*)) 

(:register scratch)) 

; Add in the local node number (i.e. put it in low 16 bits). 

(add (:register scratch) (;j-register HNR) (:register scratch)) 

; Tag it as a FD. 

(wtag (:register scratch) (rliteral ,fd-tag) (:register scratch)) 

(sendO (:literal 1)) ; Send to node 1 always 

(sendO (:ref local.getc)) ; H 2 uidler is the local.getc lib routine 

(sendO ,context-slot) ; Send the codeblock descriptor. 

(sendO (:register scratch)) ; Send the current FD, so it knows 
(free (:register scratch)) ; where to send the context back to. 

(sendeO ,(frame-base-offset return-slot)))) ; Send the return offset. 

Figure 3-8: Transformation the Get-Context Instruction to MDP Code. The purpose of the 
get-context instruction is to send off a request to allocate a context and return its value. 
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(delconversion ixcc :index-curreiit-coiitext (frame-base dest) 
(append (lookup-into dest) 

; A scratch register is needed 
'((reserve (:register scratch)) 

; Hove the local frame address into the scratch register 
(move (:j-register A2) (:register scratch)) 

; Tag it as an integer so ve cein adjust it 
(ntag (:register scratch) 

(:literal ,int-tag) 

(:register scratch)) 

: Add in the nev base, shifted over into the address 
; portion of the instruction 
(add (:register scratch) 

(:literal ,(* (literal-base-offset frame-base) 

(expt 2 *sys-len-bits*))) 

(:register scratch)) 

; Shift the sum into the top half of the vord 
(Ish (:register scratch) 

(:literal ,(- 16 ♦sys-len-bits*)) 

(;register scratch)) 

; Add the local node number into the low half of the nord 
(add (:register scratch) 

(:j-register NNR) 

(:register scratch)) 

; Tag it as a frame descriptor 
(utag (:register scratch) 

(:literal ,fd-tag) 

(:register scratch)) 

; Move it into the specified destination. 

(move (:register scratch) ,dest) 

; Free the scratch register. 

(free (:register scratch))))) 


Figure 3-9: T^^lnsfo^mation of Index-Current-Context. The purpose of index-current-context 
is to take the address of the current frame, conceptuedly add a constant offset to it, and 
convert it to file descriptor format. It Ccin then be sent to a spawned procedure as the frame 
to return results to. 
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3.4 Convert Complex J to Simple J 

This section is the most complex of the new modules. Its tasks include: 

1. Converting literal opercinds into tagged literals. 

2. Converting the suspensive-instruction, suspensive-operand, and suspensive-check-done 
pseudo-ops into MDP code. 

3. Allocating and substituting MDP registers where they were requested with the reserve 
and free pseudo-ops. 

4. Adjusting instructions to use leg^ll MDP addressing modes. 

We will examine each of these stages. 

3.4.1 Converting Literals to Tagged Literals 

Because all values on the MDP £ire tagged, references to litereds must be changed to tagged 
litereds. The integer litered operands from the addition example in Figure 3-5 would both be 
converted: 

7 —> (:tagged-literal int-tag 7) 

(:literal (:integer 1)) -> (:tagged-literal int-tag 1) 

Booleans and labels are similarly transformed. 

The other type of “literal” used is a reference — a constant whose value is determined 
at assemble-time [Horwat and Totty 1987, page 9]. References are used to denote codeblock 
pointer values, addresses of suspensive instructions, aaid branch destinations. These are de¬ 
noted with the imaginary tag ncime, “specied-tag”. These operands are converted to MDP 
reference format in the last stage of the compiler. 

3.4.2 Generating Suspensive Code 

Before a suspensive instruction, severed things must be done to ensure proper behavior: 

1. Store the current instruction pointer location into RO, so if a fault occurs, the handler 
will know where execution should resume. 
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(suspensive-instruction) 

(suspensive-operand (:frame (:base 6))) 
(suspensive-check-done) 

(labei (:tagged-iiterai special-tag (:label suspensivel 9))) 
(dc (itagged-literal special-tag :suspensive19)) 

(move (:message (:base 1)) (:j-register A2)) 

(rtag (:frame (:base 6)) (:j-register R3)) 


Figure 3-10: Intermediate Code Produced for Suspensive Pseudo-Opereinds. The DC (“data 
constant”) instruction loads its assemble-time const£int operand into RO. If the rtag (“read 
tag”) instruction faults, the heindler can use the RO value to know where execution should 
restart, as described in Section 2.2.4. 


2. Because execution could be resumed here, SQ setup code must be emitted to load 
the base of frame address into MDP register A2, i.e. (move (:message (:base 1)) 
(:j-register A2)). 

3. Check whether each suspensive operand is present, faulting if not. 

For the rationale behind these rules, refer back to Section 2.2.4, where the continuation format 
was described. Figure 3-10 shows the conversion of the suspensive pseudo-ops in the add 
instruction introduced in Figure 3-5. First, a unique label, created with the Lisp procedure 
gensym, is emitted. A reference to it is loaded into RO with the DC (“data constant”) 
instruction. The frame base is loaded into A2, after which the tag of the suspensive operand 
is read. If it faults, the rim-time handler described in Section 2.2.4 will set up a continuation. 

Although it would be more efficient not to explicitly read the tags of the suspensive 
operainds, it is necesseiry if the hybrid instruction has side effects. For example, a desti¬ 
nation might be written or a message might be sent before a specific suspensive operand was 
accessed. A later version of this compiler would optimize out the “read tag” instructions in 
cases where the explicit check would suffice. 
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3.4.3 Allocating MDP Registers 

MDP registers have two uses: passing arguments to system calls and holding temporary values 
within hybrid instructions. When used for system calls, they are explicitly referred to as in 
Figure 3-5 earlier. When they are used as temporaries, generally it does not matter which of 
the four MDP gener£d-purpose registers is used. The reserve and free pseudo-ops generated 
by the templates in convert-hybrid-to-cj are used to create and destroy bindings of symbols 
to MDP registers. For exeimple, 

(reserve (:register scratch)) 

binds scratch to a free MDP register. Until a 

(free (:register scratch)) 

is encoimtered, all occurrences of (:register scratch) are converted to (: j-register Rn), 
where n is the register bound to scratch. Because no more than four temporeiry registers are 
ever needed, no spilling needs to be done. 

The only conflict arises because RO is different from the other GPRs. The MDP instruction 
DC loads a 32-bit quantity into RO.® Except for a few special vadues, only 7-bit quantities cam 
be specified as constants to move directly into the other registers. Thus there is an internal 
compiler routine, request-appropriate-register that taikes am airgument specifying what will go 
in the register amd returns a binding to am appropriate register — i.e. RO if the airgument is 
a big vadue, another register otherwise. If RO has adready been adlocated, am instruction to 
move the old contents of RO into amother register is generated, amd the previous binding to 
RO is chamged. This process is illustrated in Figmre 3-11.^ 

3.4.4 Converting to Legal MDP Operands 

Instructions on the MDP are only 17 bits long. While this permits tight packing amd quick 
loading, it limits the operamd space. Specifically, generad-purpose registers aire required as 

^DC is more accurately an assembler pseudo-op. It must have a constant value for its operand which is 
then put directly into the instruction stream. During execution, if the instruction pointer is at something that 
is not tagged instruction, it is loaded into RO. This allows 32-bit values to be directly loaded into a register, 
despite the normal 17-bit instruction length. 

*Nate Osgood helped me develop this one-pass register allocation scheme. 
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Request 

Bindings 

Code Emitted 

(request-appropriate-register 100) 
reg9l 

reg91 -> RO 


(request-appropriate-register 11) 
reg92 

reg91 -> RO 
reg92 -> R3 


(request-appropriate-register 500) 
reg93 

reg91 -> R2 
reg92 -> R3 
reg93 -> RO 

(move (:i-register RO) 
(:j-register R2)) 


Figure 3-11: Compiler Register Allocation. Requests for registers eind the return values 
axe shown in the leftmost column. The binding naimes are generated by the Lisp gensym 
procedme. The middle colunm shows the internal set of bindings after each instruction. A 
conflict arises on the third request where RO is needed but is already part of another binding, 
regQl. The register allocator e mi ts code to move whatever has been placed in RO into a 
previously-free register, R2. The binding for reg91 is then changed to R2, and the new 
request can get RO. 


oper£inds to certcdn instructions, and only very short constants can be encoded in instructions. 

Consider the following hybrid instruction: 

(:add (:frame (:base 6)) 

(:literal (:integer 30)) 

(:frame (:base 7))) 

There are two reasons why it cannot be encoded into one MDP three-opereind instruction: 

1. The first and last opereinds must be general-purpose registers. 

2. If the second opercind is a consteint, it must be in the rcuige [15...-16]. 

The above add instruction would be tr 2 inslated into four MDP instructions: 

(move (:frame (:base 6)) 

(:j-register R3)) 

(move (:tagged-literal int 30) 

(:j-register R2)) 

(add (:j-register R3) 

(:j-register R2) 
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(:j-register R3)) 
(move (:j-register R3) 

(:fTame (:base 7))) 


The astute reader will have observed that if the order of the source operands were changed, 
they could be encoded into one less MDP instruction. I did not have time to incorporate this 
optimization for commutative instructions. 

As another example, consider a hybrid instruction to move an immediate into a frame slot: 

(:move (:literal (:integer 600)) 

(:frame (:base 20))) 


Because 500 is more than seven bits long, it must be loaded into RO through the DC instruc¬ 
tion: 

(dc (tagged-literal int 600)) 

(move (:j-register RO) 

(:frame (:base 20))) 


Like immediates, offsets from the freune base can only be five bits in three-operand instructions 
and seven bits in two-operand instructions. If the destination of the above move had an offset 
of 100 instead of 20, the code would be: 

; (:move (:literal (:integer 600)) (:frame (:base 100))) 

(dc (tagged-literal int 600)) 

(move (:j-register RO) 

(:j-register R3)) 

(dc (tagged-literal int 100)) 

(move (:j-register R3) 

(:frame (:base (:j-register RO)))) 


This illustrates the RO conflict m^meuver described in Section 3.4.3. 


3.5 Convert Simple to ASM 

This last stage converts the code to a format suitable for the MDP assembler. This involves 
converting from s-expressions into plain text and translating the operainds into a suitable 
format. Offsets £ire converted: 
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(:register (:base I)) [X,AO] 

(:frame (:base X)) —► [X,A2] 

(:message (:base X)) —> [X,A3] 

The first transformation is to convert hybrid registers to temporairy storage. On the J- 
Machine, accesses off of AO are absolute addresses. The first twenty words of MDP memory 
are devoted to temporary storage, so hybrid register n is stored at absolute address n on the 
J-Machine. As on the hybrid architecture, the value is not guMemteed to be the saime between 
suspensions. 

Additionedly, assemble-time references must be output properly. When a reference is 
encountered as em operand, it is converted: 

(:tagged-literal special-tag X) —> {Xjusg_ref} 

Additionedly, X is added to a list of references. At the end of compilation, for each reference 
X in the list, the following is output: 

ref X_msg_ref = MSG: (((X+N_loc)«10))+2 

where N is the name of the procedure. This creates a reference whose value includes the 
absolute address of the associated label (labels are usually relative addresses), as well as 
specifying that it will be used as a header of a message with two words. (The two words will 
be the message itself and the fr£ime veilue.) 

3.6 Conclusion 

In order to convert hybrid code to MDP assembly code, I created intermediate formats and 
routines to convert from more complex formats to simpler ones. These aire useful not only for 
this compiler but as a general-purpose J-Machine utility. A MDP assembly coder or compiler 
writer could produce complex J-Machine code and be spared the trouble of remembering how 
many bits of operand are available for each instruction. While my register allocation is stiU 
too primitive to give optimed results — for example, the seime value could be stored in two 
different registers — it is still good enough to provide a new dialect of MDP assembly language 
that a progrzimmer might choose for its greater abstraction and simplicity. 
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Chapter 4 


Analysis 

"A slow sort of country!” said the Queen. **Now, here, you see, it takes all the 
running you can do, to keep in the same place. If you want to go somewhere else, 

you must run at least twice as fast as that. ” 

— Lewis Carroll, Through the Looking-Glass. 

I am pleased with the system, in that it works and reasonable solutions were foimd for 
every problem. However, while some of the mechanisms worked out well, not all turned out to 
be as efficient as I wotdd like. In this chapter, I provide a detadled exeimple of code produced 
and executed for eui Id routine, severed benchmeirk results, and ainalysis of both my system 
and the J-Machine. 


4.1 Detailed Benchmark: Factorial 

In this section, I will go into great detail by providing listings and statistics for a sample Id 
procedure. Specifically, I will describe the composition and execution of the simple recursive 
factorial program shown in Figure 4-1. 

4.1.1 The Dataflow Graph 

First, the initial stages of the compiler convert the program into a dataflow graph, such as the 
one shown in Figure 4-2. I have abstracted away some of the detedls in order to highlight the 
essential p£irts of the graph. First, the input arrives at node 1. It is passed through tmehanged 
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fact n = 

if n <= 1 then 
n 

else 

n * fact (n-1); 


Figure 4-1: Id Code for Factorial 



Figure 4-2: A Dataflow Graph for Factoricd. K an integer n is input to the top identity node, 
n! win be computed. The switch node uses its left input as a control signal and its right input 
as data. If the control signed is true, data goes to the left output arc; otherwise, to the right. 
The identity node copies its inputs to its output arcs. The dotted line from the call node 
to the mid node indicates that the connection is indirect. The numbers are for expository 
purpose only. 
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by the identity instruction to nodes 2 and 3. Node 2, the predicate, passes a boolean value to 
node 3, a switch instruction. The semantics of the switch instruction are such that it passes 
its data input to the left output arc if the control input is true, and to the right arc if the 
control input is false. Thus, if the predicate is true — i.e. if the argument is less than or 
equal to one — the argument itself will be sent to node 9 and returned. In the inductive case, 
the Mgument is sent to identity node 4. Node 7, call fact, makes the recursive call, specifying 
that the return value should be sent to node 8, mvi. When it arrives, the mTiltiplication is 
performed, and a value is sent to node 9 to be returned. 

The purpose of showing and describing the graph is to give an idea of how the compiler 
looks at a procedure. larmucci’s stages of the compiler can only see the dat 2 iflow graph, not 
the source code. 


4.1.2 The Hybrid Code 


The hybrid code produced by the factorial example is shown in Figure 4-3. I have added 
comments, lines headed with semicolons, to describe the process. Readers \ininterested in 
such technical detail should skip to Figure 4-4 which shows the SQs’ composition at a higher 
level. Figure 4-5 shows frame usage. 
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;;; SQ-1 does initialization, forks local SQs, and tries to return 
;;; the result. 

((:UBEL (:LITERiL (:STMB0L :Sq-l)))) 

;; Put the codeblock pointer to FACT into [12]. 

((:H0VB (:LITERAL (:CODE-BLOCK :FACT)) (:FRAMB (:BASB 12)))) 

Fork SQ 2, immediately suspending it if [3], n, is empty. 

((:CONTINOE-TBST (:FRAHB (:B1SE 3) :SUSFENSIVE) (:LITERAL (:SYMB0L :Sq-2)))) 
;; Fork Sq 11, immediately suspending it if [0], the return location, 

:; is en^ty. 

((:C0HTIHUE-TEST (;FR1ME (:BASE 0) ;SUSPENSIVE) (:LITERAL (tSTMBOL :Sq-ll)))) 
((:LiBEL (:LITERAL (:SYMBOL :SEND-RESULT-0)))) 

;; Pass [6], the return value, up to offset 1 from the calling frame. 
((:M0TE-REH0TB (:FRAHE (:BASE 0) tSUSPENSITE) 

(:LITERAL (:INTEGER 1)) 

(:FRANE (:BASE 6) :SUSPENSIVE))) 

;; Pass [5], a signal ("true"), up to offset 0 from the calling frame. 
((:H0VE-REH0TE (:FRANE (:BASE 0)) 

(:LITERAL (:INTEGER 0)) 

(:FRAHE (:BASE 5) :SUSPENSIVE))) 

((:TERMINATE)) 

Sq-11 sets [5], the signal, when locations [0] and [7] have data. 

((:LABEL (:LITERAL (:SYMB0L :Sq-ll)))) 

((:TEST-2 (:FRAME (:BASE 0) :SUSPENSIVE) 

(:FRAME (:BASE 7) :SUSPENSIVE) 

(:FRAME (:BASE 5)))) 

((:TERHINATE)) 

Sq-2 evaluates the predicate and runs appropriate code. 

((:LABEL (zLITERAL (:SYHB0L :Sq-2)))) 

;; Put in [4] the result of checking if [3], the argument, is <= 1. 

((:<- (:FRAME (:BASE 3) :SUSPENSIVE) 

(:LITERAL (:INTEGER 1)) 

(iFRAME (:BASE 4)))) 

;; If not, branch to ELSE-4. 

((:BRANCH-FALSE (:FRAME (:BASE 4)) (:LITERAL (:SYMBOL :ELSE-4)))) 

;; Copy [3], the argument, into [6], the slot for the result. 

((:M0VE-IDENTITY (:FRAME (:BASE 3) :SUSPENSIVE) (:FRAME (:BASE 6)))) 

;; Copy [4], the predicate result ("true") into [7]. 

((tMOVE-IDENTITY (:FRAME (:BASE 4)) (:FRAME (:BASE 7)))) 

;; Branch past inductive case code. 

((:BRANCH (:LITERAL (:SYMBOL :END-IF-4)))) 


;; Continued on next page. 
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;; Continuing from previous page. 

;; This coda gets ezecnted lor the inductive case. 

((;LABEL (:LITERiL (:STMB0L :ELSE-4)))) 

;; Subtract 1 from [3], the argument, and put the result in [14]. 

((:- (:FRAME (:B1SE 3) :SUSPENSIVE) 

(;LITERiL (:INTE6ER 1)) 

(tFRAME (:BASE 14)))) 

;; Spawn the codablock whose name is in [12] (fact), putting the context 
;; value into [10]. 

((:6ET-C0irrEXT (:FR1ME (:BASB 12)) (:FIUME (:B1SE 10)))) 

;; Specify [8] as the base location for return values for the 
;; spawned procedure. 

((:INDEX-(nntRENT-CONTEIT (:LITERAL (:BASE 8)) (;REGISTER 0))) 

;; Send this adjusted context (i.e. the return location) to slot zero 
;; in the spawned procedure. 

((:M0TE-REM0TE (:FRAHE (:BiSE 10)) 

(:LITERiL (tINTEGER 0)) 

(:REGISTER 0))) 

;; Send [10], the argument minus 1, to slot three in the spawned procedure. 
((:H0TE-REH0TE (:FR1ME (:BASE 10)) 

(:LITERAL (:INTEGER 3)) 

(:FRAME (:BASE 14)))) 

;; Fork SQ-5, immediately suspending it if [3], the argument, isn’t here. 
((:C0NTINUE-TEST (:FRAME (;BASE 3) :SUSPENSIVE) (:LITERAL (:STMB0L :SQ-5)))) 
;; Fork SQ-8, immediately suspending it if [8], the signal that the spanned 
;; procedure is done, isn’t here. 

((:C0NTIKUE-TEST (:FRAME (:BASE 8) :SUSPENSIVE) (iLITERAL (tSTMBOL :SQ-8)))) 
((jLABEL (:literal (:STMB0L !END-IF-4)))) 

((:TERNINATE)) 

;; SQ-8 frees the context of the spawned procedure if it’s not needed 
;; any more. 

((:LABEL (:LITERAL (:STHB0L :SQ-8)))) 

;; Suspend if [8], the signal that the spawned procedure is done, is present 
((:TEST-1 (:FRAME (:BASE 8) tSUSPENSIVE) (rREGISTER 0))) 

;; Return [10], the context of the spawned procedure, writing true into [11] 
((:RETDRM-C0HTEIT (iFRAME (:BASE 10) iSUSPENSIVE) (:FRAME (:BASE 11)))) 

:> Copy [11], the true signal, into [7], a signal that all work is done. 
((;M0TE-IDENTITT (:FRAME (tBASE 11)) (:FRAME (:BASE 7)))) 

((:TERNINATE)) 

;; SQ-5 is spawned only for the inductive case. 

((:LABEL (:LITERAL (:SYMBOL :SQ-5)))) 

;; Multiply [3], the argument, by [9], the value returned by the recursive 
call, putting the result into [13]. 

((:* (:FRAME (:BASE 3) :SUSPENSITE) (:FRAME (:BASE 9) tSUSPENSIVE) 

(iFRAME (:BASE 13)))) 

;; Hove this value into [6], the slot for the return value. 

((:M0VE-IDENTITT (iFRAME (:BASE 13)) (tFRAME (:BASE 6)))) 

((tTERMINATE)) 


Figure 4-3: Hybrid Code for Factorial. 
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Figiire 4-4: There Me five schediiling qucinta in factorial. The numbers in the SQ names have 
no significance, except that the first is ^llways neimed SQ-1. Arrows indicate where SQs may 
be forked emd can be thought of as a subset of data dependences. SQs 5 and 11 are only 
spawned in the recursive case. Observe that on its first execution, SQ-1 will fault midway 
through, because the return results will not be ready. Execution will restcirt in the middle of 
the SQ. 
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Base Case 


Recursive Case 


FD of return location 

0 


FD of return location 

unused 

1 


unused 

unused 

2 


unused 

argument (n) 

3 


argument (n) 

n <= 1 ? 

4 


n<-1 ? 

signal, [0] & [7] full 

5 


signal, [0]& [7] fuli 

result 

6 


result 

signal, [4] true 

7 


signal, [11] full 


8 


signal, rec call done 


9 


result of rec. call 


10: 

context of rec. call 


11: 

signal, [10] freed 


12: 

fact_codeblock 


13: 



14: 

n -1 


Figure 4-5: Frame Slots Used by Factorial Code. The left frame shows slot usage in the base 
case, £ind the right freime shows slot usage in the recursive case. Signads are flags that are set 
to indicate that the described condition has been met; i.e. [5] is explicitly set to true after 
values are written to [0] and [7]. Note that the saime slot, such as [7], can have a different 
meaming for the two mutually exclusive cases. 
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4.1.3 The MDP Code 


The MDP code is included in Appendix A.l. It has all of the same characteristics as the hybrid 
code, i.e. the same frame slot assignments and SQs (modulo my slightly different calling 
convention). The hybrid code had 28 instructions; the MDP code has 180, not counting code 
in library routines. Thus there are an average of 6.4 MDP instructions per hybrid instruction. 
T his blow-up is not as bad as it seems because a MDP instruction word is roughly one-fourth 
the size of a hybrid instruction word.^ Part of the growth is thus the accepted expansion factor 
between CISCy and RISCy €irchitectmes. As the reader will recall, there are two reasons that 
one hybrid word expands into many instruction words: First, hybrid instructions are more 
powerful and suited to the speci^d purpose than the J-Machine; second, an expansion occurs 
to fit the code into the more restrictive MDP addressing modes. 

4.1.4 Load Balancing 

As mentioned in Section 2.3.2, my compiler does no load balancing. The user must modify the 
code produced by the get-context instruction to spawn procedures to an appropriate processor, 
usually a function of the argument(s); otherwise, all calls will go to the same node. Because 
factoried is singly recursive, it makes sense to spawn (fact n) onto processor n, because no 
task will already be running there. I cheinged one line of the compiled routine to implement 
this. If n were potentially larger them p, the number of processors, we would take its value 
modiilo p. This would guarantee jui even distribution. 

4.1.5 Dynamic Counts 

When I ran (fact 4) on the MDP simiilator, it took 1263 ticks for the resTilt to be written 
to the original calling freime. A tick is the time unit used by the simulator: One tick equals 
one instruction, even though not aiU instructions on the J-Machine wiU take the same time. 
The simulator also ignores network latency. Four processors were enabled, and utilization was 
37% — i.e. on average, a processor did useful work a little over a third of the time. Fault 

cannot give an exact length for hybrid words, because the compiler I used was for a paper version of the 
architecture where word lengths were essentially unlimited. According to lannucci in private correspondence, 
the word size can be roughly thought of as 64 bits. 
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Handler Name 

Times Called 

Ticks/Call 

Total Ticks 

Lookup 

25 

5 or 6 -f 6w 

212 

CFUT 

21 

18 

378 

Move-Remote 

16 

13-f-7u; 

264 

Continue-Test 

14 

7 or 20 

189 

Get-Context 

3 

24 

72 

Allocate 

3 

12 

36 

Total 

136 

n/a 

1061 


Table 4.1: System Calls for (fact 4). Ranges Me specified for the ticks/call column because 
the time may depend on the data. For lookup and move-remote^ w specifies the munber of 
waiting continuations. An estimated average number is used (with w |) to approximate 
the totad ticks. 


Instruction Type 

Times Used 

Percent 

Comments 

Move 

882 

47.8 

Both reg-reg and reg-frame 

Field 

247 

13.3 

Operations on tags 

Network 

237 

12.8 

Sending messages 

DC 

159 

8.6 

Loading consteints into RO 

Branch 

125 

6.7 

Does not include busy-looping 

Fault 

87 

4.7 

Entering and leaving system calls 

ALU 

61 

3.3 

ALU ops for program and libraries 

NOP 

46 

2.4 

NOPs used as padding to align instructions 


Table 4.2: Dynaimic Instruction Usage for (fact 4). 


and library usage is shown in Table 4.1. As the totals show, 84% of the time was spent in 
the librMies. The routine that consmned the most time was the cfutme fault handler. It 
was Ccdled 21 times, and each time took 18 ticks. As described in Section 2.2.4, the cfutme 
handler must adlocate space to store a continuation emd fill in the necessary data. Dynamic 
instruction usage (not cotmting idle cycles) is shown by category in Table 4.2. 

The average number of instructions executed per message is 92.6, which is larger than 
the 55 instructions per message empirically found by Horwat in [Horwat 1989, page 104]. His 
Concurrent Smedltalk version of the same factoried progreim takes only 315 ticks to complete 
[Horwat 1989, page 110], compared to my 1263. 
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Argument 

Ticks/Call 

Ticks/Skewed Calls 

Ticks/Nonskewed Calls 

4 

1263 

1*‘ Result 2”^ Result 
1864 2163 

1** Result 
1992 

2*^ Result 
2271 

8 

2691 

4204 

4590 

4332 

4611 

12 

4119 

6544 

6846 

6672 

6951 


Table 4.3: Throughput for Factorial. This table compares the number of ticks required to 
compute one and two calls of factori£d. For each case, the number of processors used is the 
same eis the argmnent. The first data column shows how long it takes for one cedi executing 
alone to complete. The second set of columns shows how long it t£ikes to complete two factorial 
calls made at the same time, skewed among the enabled processors. The last set of columns 
shows the completion times when the two calls are not skewed among the processors. 


4.1.6 Throughput 

One reason for the high latency is that, at every design decision, throughput was favored over 
latency. This is due to the decision to break ap^lrt any trsinsaction of unboimded latency, which 
increased the latency of tasks but improved throughput. Table 4.3 shows that computing two 
invocations of factorial concurrently on the J-Machine takes significantly less than twice as 
much time as computing a single call. This is true for two reasons: 

1. Each task suspends itself when it is wedting for a result from another processor. 

2. The factorial calls can be skewed among the processors. 

The table isolates these factors by including results for when the procedure calls are skewed 
and when they eire not. Even when two factorial calls execute on the same processors, in the 
same order, throughput is increased over the single cadi case. This is because subtasks of the 
second factorial cadi can execute when no work can he done on a given processor towaird the 
first factoriad call. 

4.1.7 Conclusion 

Although I was pleased that the throughput of the system was better than the reciprocad 
of the latency, I was disappointed by the high latency, although it was predictable. One of 
my purposes in following the factorial program through each step was to show where adl the 
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fib n = 

if n <= 1 then 
n 

else 

fib (n-1) + fib (n-2); 

Figtire 4-6: Id Code for Fibonacci 


overhead was added. I should have expected the traditioncd costs of simulating one archi¬ 
tecture on another. Almost half the time was spent on the “cfuture” and “lookup” handlers 
which store suspended continuations and revive them, respectively. It would be impossible 
to simulate these features with high efficiency. The problem can be summarized succinctly: 
Because almost all the s)mchronization is handled in software, it is impracticeil to synchronize 
on individual freune slots. While the costs incurred to synchronize on arguments and retm-n 
values would be reasonable, synchronizing on temporeiry values is excessively expensive. This 
is exacerbated by the hybrid compiler’s lavish creation of frame slots, which make sense on 
its architecture but not when synchronization is done in software. 

While I was initially optimistic after the 6.4:1 code expeinsion because of the normal 
CISC/RISC trade-offs, we see now that this number is actueiUy irreleveint, as the vast majority 
of time is spent in librairy routines that the hybrid architecture would have in hardware. 
Simulating the hybrid architecture was thus not an optimal choice for implementing dataflow 
on the J-Machine. A better choice is described in the next chapter. 


4.2 Fibonacci 

Another program I benchmarked was the doubly-recursive Fibonacci routine shown in Fig¬ 
ure 4-6. The corresponding MDP code is in Appendix A.2. There are 46 lines of hybrid code, 
which translate to 271 lines of MDP code, yielding a ratio of 5.9:1. Because its transformation 
is so simil 2 ir to factorial’s, I wiU not go into detail, except to mention that I added a distribu¬ 
tion function to load balaince. EmpiriceJly, I foimd the function (((p^lnd n)-\-{p or n)) and 31), 
where p is the current processor number and n the new argument. In runs with more than 
one processor, I used this function to map cedis to processors. 


58 



Argument 

# Processors 

Number of Ticks 

1 

1 

166 

4 

1 

4353 

4 

6 

2105 

6 

1 

13760 

6 

13 

3473 

8 

22 

5628 

10 

32 

9566 

12 

32 

67641 


Table 4.4: Timings for Fibonacci. Note that, \mtil the argument gets very lairge, the growth in 
number of ticks is not exponential when meiny processors eire used. Computing Fib(n) takes 
roughly procedure calls, which cam be distributed among the processors. 


The times and statistics for execution with different arguments is shown in Table 4.4. Note 
that for low arguments using multiple processors, growth is closer to linear than exponentiad. 
This is illustrated in Figure 4-7.^ 

While the number of ticks was higher th^ln I would have liked for Fibonacci, the change 
in its order of growth was just the sort of thing one hopes to see on a parallel computer. I 
was only able to simulate 32 processors. The results should be fantastic when a 4096-node 
J-Machine comes on-line. 


4.3 Loop Parallelization 

In the Fibonacci example, the peirallelism was due to a function distribution strategy that I 
added by hatnd, thus it cainnot really be counted as part of the system. This is in contrast to 
loop parallelization, for which it is str£iightforward for the compiler to provide parallelism: If 
there are K iteration eireas, each need only be assigned unique processors to send subcaUs to; 
for example, iteration £irea i could spawn its subcalls to processors i, i K, etc. Because K 
would be aveiilable at run-time (and optionally at compile-time), this could easily be computed. 

Because the compiler did not handle loops, as explained in Section 2.3.3,1 compiled simple 
loop programs by haind and did not have the time or compute-power for a large example. The 

^Unfortunately, I was unable to get a Concurrent Smalltalk timing to compare it to. 
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def loop n = 

{ sum = 0 
in 

{ for i <- 1 to n do 

sum. increment = sum i; 
next sum = sum.increment 
finally sum }}; 


Figure 4-8: Id Code for Loop Example 


loop program I used is shown in Figure 4-8. The progreim retmns the sum of the first n 
integers.® 

The produced code may be seen in Appendix A.3. There are 48 lines of hybrid code which 
were translated to 188 lines of MDP code, for a 3.9:1 ratio. The better instruction ratio is 
due to my hand-compiling the code rather than using my non-optimizing compiler. While I 
purposely did not generate top-quality code, I still used better register allocation than the 
compiler, saving reloads. Another factor was the reliance on additional library routines. 

This program is a useful benchmark in that it shows the overhead to set up iteration areas 
and to launch iterations of a loop in parallel. The number of ticks, as a function of K, the 
mtmber of iterations to unroll, and n, the firgument, is 50 -f 5 * A” 135 * n. The three addends 
of the formula cam be interpreted: 

1. The constant term, 50, indicates that the additional cost for a procedure to use loop 
parallelization is low. There is thus no inhibition against parallelizing loops. 

2. The 5 * K term is a pleasamt surprise: Once the base cost for loop pairaUelization has 
been paiid, it only costs 5 ticks to add and support each iteration area. This makes it 
reasonable to unroll mainy iterations of a loop. 

3. The 135 * n term shows that each dynamic iteration of the loop is costly. However, this 
£ilso can be thought of in terms of constant overhead: If each iteration of the loop spawns 
a long subroutine, as in the excimple in Figure 2-13, the only additional code that will 

*The body of the loop could have been written more succinctly as next sum = sum + i. I work with this 
version because the hacked-up version of lannucci’s compiler could not compile new loop programs, and the 
hybrid code for this example was the only available. 
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execute on the loop processor is that to spawn a procedure call. This means that each 
iteration of the loop will use fewer them 200 ticks on its home processor, regeirdless of 
how big a computation it performs. As described above, it is trivial to distribute its 
procedure calls so that they do not interfere with those of other iterations. 

I thus consider the loop parallelization strategy a success, jilthough I am stiU dissatisfied 
with the overhead. A primary reason for the high overhead is the small number of registers 
on the MDP. There are only four general-purpose registers and four address registers. Two 
of the address registers are special-purpose and cainnot be used by my system, and one, A2, 
is dedicated to holding the base of frame. This only leaves one address register, Al, to use 
as £in iteration area pointer, which is inadequate. Because hybrid addressing modes exist to 
directly access slots of the previous, current, eind next iteration areas, as well as offsets from 
the current frame, it would be useful to have sp£ire address registers for each of these pointers. 
As things aire now, the value in Al keeps getting clobbered as references are made to the other 
iteration eireas, requiring the addresses to be recomputed frequently. Because of the shortage 
of gener£il-purpose registers, I cannot use them to cache frequently-needed vedues. 

4.4 Conclusion 

For simple progreims like factorial and Fibonacci, the code performed several times worse 
than Concurrent Smedltalk code. While this is disappointing, it is to be expected, as one 
architecture is being used to simulate another. Loop parallelization provided very promising 
results, particularly because the semantics of Id and the state of its compiler are such the 
programmer need never be awMe of possible peirallelization. Any gain in par^dlelization and 
efficiency that occurs without any programmer effort is a big win. 
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Chapter 5 


Conclusion 


And oftentimes, to win us to our harm, 

The instruments of darkness tell us truths. 

Win us with honest trifles, to betray ’s 
In deepest consequence. 

— William Shakespe£ire, Macbeth, Act I, Scene iii, line 123. 

The current system has several strengths £ind weaknesses. I consider its primary strengths 
to be: 

• It successfully simulates the hybrid eirchitectme within em acceptable factor of code 
expansion. 

• It includes a powerful loop par£illelization strategy that shows the feasibility of concm- 
rent execution of iterations of a loop. 

• The observed throughput of the system implies that it succeeds to some extent at latency 
toleration — something more important in real systems and big programs than in toy 
benchmarks. 

• By taking advantage of the Id language and compiler, it is possible for to write parallel 
programs for the J-Machine without explicit mention of pairallelism. 

The only disappointment is that the costs of going through the hybrid architecture may 
outweigh the benefits. There eire three incremental approaches that cein be taken in future 
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efforts: improving MDP code generation, improving hybrid code generation, and eliminating 
weaknesses of the J-Machine. I discuss each of these and then propose taking a different 
approach. 


5.1 Improving MDP Code 

As mentioned in appropriate sections throughout the document, the MDP code I produce is 
not optimal. Specifically, register assignment is primitive, and veirious peephole optimizations 
could be performed. In contrast, the libraries (see Appendix B) are tightly haind-coded, as 
I wrote them directly in MDP assembly language. Because roughly 80% of execution time 
is spent in the libreiries, local optimizations of compiled code are unimportant. Even if I 
could double the speed of the compiled code produced, the total execution time would only 
increase by 10%. Therefore, it is not feasible to drastically improve the code through local 
optimization. 


5.2 Improving Hybrid Code 

One problem with my system is that the hybrid code I begin with is non-optimail, particu- 
leirly in terms of the J-Machine, where cfuture faults, lookup cedis, etc., are costly. I think 
optimizations to the hybrid compiler would go much further than ones in my back-end for the 
MDP. For severed reasons, however, it seems that modifying the hybrid compiler would be a 
poor idea: 

1. The hybrid compiler does not fit quite properly on top of the current version of the base 
Id compiler, eind work would be required to bring them into synch. 

2. Peirticuleirly because the code was written by someone else, writing new code might be 
easier than modifying it. This is meant not to criticize lemnucci’s excellent and very 
readable coding style but as a general comment on the difficulty of one programmer’s 
modifying another’s code. 

3. H extensive modification or a re-write is necessary, there is no reason for the extra costs 
added by going through an intermediate airchitecture. 
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Because the hybrid architecture is too different for the J-Machine to execute its code as 
efficiently as code generated specifically for the J-Machine, it makes little sense to put effort 
into generating hybrid code that would be better for the J-Machine. 

5.3 Strengths and Weaknesses of the J-Machine 

Several features of the J-Machine meike it excellent for rmuiing dataflow code; it was designed 
to support fine-grained computation as described in [Dally 1988a]. The features not foimd on 
most computers that proved most beneficied were: 

1. Hardware support for cfutures. 

2. The low-latency network which gives the freedom to send frequent messages encouraging 
the division of tasks. 

3. User-defined tag types, which aided debugging. 

4. The Wge number of processors that will be available. 

There were some things I did not like about the J-Machine. Suggestions for alleviating two of 
the worst problems £ire: 

1. Increase the number of address and general-purpose registers. Four of each, particularly 
when some have speciail purposes, is inadequate, as described in Section 4.3. 

2. Hardware support for cfuture suspensions would make frequent synchronization much 
more affordable. The 18 ticks for each cziU of the cfuture fault handler is too expensive. 

At a recent Concurrent VLSI Architecture group meeting, I was pleased to find that others felt 
the same needs and that such cheinges might be made for the next version of the J-Machine. 

In sever^ll instances, however, of imperfect fits between hybrid code emd the J-Machine, it 
is impossible to blcune either circhitecture. From this observation and the above descriptions 
of rejected ideas for incremented chcinges, I would like to propose a different approach that 
does not rely on trying to fit the two airchitectures together. 
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5.4 Synchronization on Tokens 


After reading this document based on Traub and lannucci’s method of partially sequentializing 
dataflow programs, it is diflicidt to step back and imagine a different method that does 
not use frames and continuation lists. Such a method exists, based more directly on Greg 
Papadopoulos’ explicit token store (ETS) [Papadopoulos 1988].^ The basic idea, used on 
Monsoon, is that each cycle, a token is removed from the queue. Its context value, c, is added 
to the destination instruction offset, s, and that location is checked. If the location is empty, 
the value, v, is stored there. If the location is not empty, the value stored there must be the 
other argument, so the instruction is executed. It is not obvious why this method is better 
for the J-Machine, but empirical results suggest it is. 

Last year, as a UROP, I designed a method to use the explicit token store on the MDP 
[Spertus 1989]. Its only similarity to my new system is that the message words are: 

1. Instruction address, s. 

2. Context value, c. 

3. Data value, v. 

Figures 5-1 and 5-2 show code for the -1 and multiply nodes, respectively, from the dataflow 
graph in Figme 4-2 provide exeimples of monadic and dyadic nodes. The cfuture fault handler 
is only two lines long and is shown in Figure 5-3. BUI Dally played a major role in developing 
these templates. For further deteiils, such ais the c£dling convention, see [Spertus 1989]. 

The only benchmeirk for this system is factorieil. It took 431 ticks to compute 4!, compared 
to the new system’s 1263. The comparison is fair even though it is between heind-compiled 
and machine-generated code, as transforming dateiflow nodes is streiightforward. Actually, 
the comparison is unfair in the other direction, because so much intelligent effort has been 
put into the hybrid system. If I had spent the past year studying how to improve the ETS 
code, such as by discovering how to combine a few instructions with known orderings into a 
single macro-dataflow node, this technique would surely surpass the performance of the hybrid 
system, especiedly because it is already better. 

’While lannucci’s method uses an explicit token store also, the schema I am presenting is more trivially 
based on Papadopoulos’ ideas. 
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; Subtract 1 node 
factl.nodeS: 

move [2,A3], R1 

sub Rl, 1, R1 

DC MSG:(factl_node7_left<<sys_len_bits)+3 

send2 3, RO, 0 

send2e [1,A3], Rl, 0 

suspend 

Figure 5-1: A Monadic Node Using ETS. The data value is taiken from offset two in the 
message, and the constant 1 is subtracted. The result is sent to the left input of node 7 on 
processor 3. 


; Multiply node 
factl_node8: 

move [1,A3], RO ; Put data.addr in A2 

move RO, A2 

move [2,A3], Rl ; Get new argument 

; This line may fault 
mul Rl, [0, A2], Rl 

DC MSG:(factl_node9_right<<sys_len_bits)+3 

send2 6, RO, 0 
send2e [1,A3], Rl, 0 

; Cleein up 

wtag RO, CFUT, RO 
move RO, [0,A2] 
suspend 


Figure 5-2: A Dyadic Node Using ETS. If the other argument has not eirrived yet, a fault 
occurs instead of a multiply. The fault handler will write the new argiunent to the faulted 
slot. If the other argument is already there, the multiply precedes, and a token is sent to node 
9, after which the slot must be emptied if the frame is to be reused. 
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: cfutiire handler 
fault_cfut_loc: 

move Rl, [0,A2] 
suspend 


Figure 5-3: The Cfuture Hetndler for ETS. It simply moves the new argument, gueireinteed to 
be in Rl, into the slot reserved for the argument. 


5.5 Conclusion 

Even though I do not think the system good enough to justify continuing dateiflow research 
on the J-Machine by building on it, I consider the experiment with the hybrid architecture to 
be a success. In addition to the successful results described in the analysis, there were several 
other successful aspects to the project: 

• By working with both the Computation Structures Group emd the Concurrent VLSI 
Architecture group, I was able to help cross-fertilize two groups that have very different 
outlooks on the same problem, parallel computation. MIT has been criticized for not 
having enough communication between groups. 

• By stretching on the J-Machine in ways its designers never imagined, I have foimd some 
of its limits. While this does not mean the J-Machine is flawed or necessarily should 
be changed, its architects should keep aware of what trade-offs they have made and 
reconsider them. 

• In the process of building my compiler, I have bui l t utilities that wiU convert among 
different formats of MDP code. This shotdd edd other J-Machine programmers in future 
work. 

• There have stiU been few enough MDP coders that I have significantly increased the 
number of hours spent MDP hacking. I have helped contribute to the set of known neat 
hacks for the J-Machine (such as with the code in Figure 3-6). 

• By proving the feasibility of peiredlelizing iterations of a loop and presenting ideas on 
how “straight-line” Id code could be better converted, I have made a powerful case for 
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Appendix A 


MDP Program Examples 

A.l MDP Code for Factorial 


modul* FACT 


j((:LABEL (iLITERAL (:SYHB0L :Sq-l)))) 

sq.i: 

HOVE [1,A3], 113 
MOVE R3, A2 

;((:M0VE (:LITERAL (:CODE-BLOCK :FACT)) (:FRAME (:BASE 12)))) 

MOVE 12. R1 

CALL LOOKUP.VECTOR 

DC {FACT_cod«block_p*f> 

HOVE RO. [12.A2] 

;((:COITIIUE-TEST (:FRAHE (:BASE3) :SUSPEISIVE) (:LITERAL (:SYHBOL :Sq-2)))) 
DC {Sq_2_m«g.r«f> 

MOVE 3, R1 
CALL CITT.VECTOR 

;((:COITIIUE-TEST (:FRAHE (:BASE 0) :SUSPEISIVE) (:LITERAL (:SYHB0L :Sq-ll)))) 
DC {Sq_ll_iiisg_rrf} 

MOVE 0, R1 
CALL CITT.VECTOR 

;((:LABEL (:LITERAL (:SYMBOL :SEID-RESULT-0)))) 

SEID.RESULT.O: 

MOVE [1,A3], R3 
HOVE R3, A2 

;((:MOVE-REHOTE (:FRAME (:BASE 0) rSUSPEISIVE) 

; (: LITERAL (:IITEGER D) 

; (:FRAME (:BASE 6) :SUSPEISIVE))) 

SUSPEISIVE4683: 

MOVE [1,A3], R3 
MOVE R3, A2 

DC {SUSPEISIVE4683jiisg_rei} 

RTAG [0,A2]. R3 


70 




RTA6 [6,A3], R3 
SEIDO [O.A2] 

DC ■aOCAL_HOVa_msg_r«f> 

SEIDO no 
SEIDO [O.A3] 

SEIDO 1 
SEIDEO [6,AS] 

;((:H0VE-RE1(0TE (:FRAME (:BASE 0)) (;LITERAL (illTEGER 0)) 
; (:FRAME (:BASE E) rSUSPEISIVE))) 

SUSPEISIVE4689: 

MOVE [l.AS], R3 
MOVE R3, A3 

DC '[SUSPEISIVE4689_iiisg.raf} 

RTAG [E,A2], R3 
SEIDO [0,A3] 

DC ■CLOCAL.MOVR.msg.raf} 

SEIDO RO 
SEIDO [0.A3] 

SEIDO 0 
SEIDEO [E,A2] 

;((:TERMIIATE)) 

SUSPEID 

;((:LABEL (:LITERAL (:SYMBOL :Sq-ll)))) 

SQ.ll: 

MOVE [l.AS], R3 
MOVE R3. A2 


;((:TEST-2 (:FRAME (:BASE 0) :SUSPEISIVE) (:FRAME (:BASE 7} :SUSPEISIVE) 
; (:FRAME (:BASE E)))) 

SUSPEISIVE469E: 

MOVE [l.AS], RS 
MOVE RS, A2 

DC ■[SUSPEISIVE469E_;iisg_T«f} 

RTAG [0,A3], RS 
RTAG [7.AS], RS 
MOVE E, R1 
CALL LOOKUP.VECTOR 
MOVE trua. RS 
MOVE RS, [E,A2] 

;((:TERMIIATE)) 

SUSPEID 

;((:LABEL (:LITERAL ({SYMBOL :Sq-2)))) 

Sq_2: 

MOVE [l.AS], RS 
MOVE RS, AS 

;((:<= ({FRAME ({BASE S) {SUSPEISIVE) ({LITERAL ({IITEGER D) 

; ({FRAME ({BASE 4)))) 

SUSPEISIVE4702{ 

MOVE [l.AS], RS 
MOVE RS, AS 

DC ■[SUSPEISIVE4702^g_raf> 

RTAG [S,A2], RS 
MOVE 4, R1 
CALL LOOKUP.VECTOR 
MOVE [S.AS], RS 
LE RS. 1, RS 
MOVE RS. [4,AS] 
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;((:BRUCI-FALSE (:FIUHE (:B1SE 4)) (:LITEUL (:SYMBOL ;BLSE-4)))) 
MOVE [4.12], R3 
BT R3, 2 

DC •[ELSE_4_ip_r«f> 

MOVE RO. IP 


;((:MOVE-IDEBTITY (tFRlME (:B1SE 3) iSUSPEISIVE) (:FRAME (:BASEe)))) 
SUSPEISIVE4710: 

MOVE [1,13], R3 
MOVE R3, 12 

DC ■CSUSPEISIVE4710_msg_r«l> 

RTIG [3,12], R3 
MOVE 6, R1 
CALL LOOKUP.VECTOR 
MOVE [3,12], R3 
MOVE R3. [6.12] 

j((:MOVE-IDElTITY (:FR1ME (:B1SE 4)) (:FR1ME (:BASE 7)))) 

HOVE 7, R1 
CALL LOOKUP.VECTOR 
MOVE [4,A2], R3 
HOVE R3, [7,12] 

;((:BRAICH (:LITERAL (:SYMBOL :EID-IF-4)))) 

DC {EID_IF.4_ip_r«f> 

HOVE RO. IP 

:((:LABEL (:LITERAL (;SYMBOL :ELSE-4)))) 

ELSE.4: 

MOVE [1,13], R3 
MOVE R3, 12 

(:FRAME (:B1SB 3) :SUSPEISIVE) (:LITERAL (:IITEGER 1)) 

; (:FRAME (:BASE 14)))) 

SUSPEISIVE4718: 

HOVE [1,13], R3 
MOVE R3, 12 

DC {SUSPEISIVE4718jiisg_raf> 

RTA6 [3,12], R3 
HOVE 14, R1 
CALL LOOKUP.VECTOR 
MOVE [3,12], R3 
SUB R3, 1, R2 
MOVE R2, [14,12] 

;((:COITIIUE (iLITERlL (:SYHBOL *:Sq4674)))) 

MOVE HR. R3 
SEIDO R3 

DC {SQ4674jii8g.r«f} 

SEIDO RO 
SEIDEO 12 

;((:GET-COITEXT (:FR1ME (:B1SE 12)) (:FR1ME (:B1SE 10)))) 

MOVE 12, R3 

HTIG R3, IIT, R3 

LSH R3, 6, R3 

MOVE KIR, R2 

ADD R3, R2, R3 

VT16 R3, FD. R3 

sandO [14,12] ; argument 

DC ‘[LOClL.GETC.msg.raf} 

SEIDO RO 
SEIDO [12,12] 
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SBIDO E3 
SEIDEO 10 

;((:SPECZiL-TEST-l (:PRIME (:B1SE 10)))) 

SUSPEISIVE4729: 

HOVE [1,13], R3 
HOVE R3, 12 

DC {SUSPEESIVE4729^g_r«f> 

RT16 [10,12], R3 

;((:IIDEZ-CURREIT-CQITEZT (:LITER1L (:B1SE 8)) (:REGISTER 0))) 

HOVE 12, R3 
«T16 R3, in, R3 
MOVE RO, R2 
DC 8192 
IDD R3, RO, R3 
LSH R3, 6, R3 
MOVE HER, R1 
IDD R3, Rl, R3 
HTIG R3, FD, R3 
MOVE R3, [0,10] 

:((:MOVE-REHOTE (:PRIME (:B1SE 10)) (:LITER1L (iIITEGERO)) 

; (:REGISTER 0))) 

SEIDO [10,12] 

DC ■[LOClL_HOVR.iiisg_r«f> 

SBIDO RO 
SEIDO [10,12] 

SEIDO 0 
SEIDEO [0,10] 

; ((:MOVE-REHOTE (;PRIME (:B1SE 10)) (:LITER1I. (;IITESER 3)) 

; (:FR1ME (:B1SE 14)))) 

SEIDO [10,12] 

DC •[LDClL_MOVR_i«sg_r«f> 

SEIDO RO 
SEIDO [10,12] 

SEIDO 3 
SEIDEO [14,12] 

;((:TERHII1TE)) 

SUSPEID 

;((:L1BEL (iLITERlL (;SYMBOL t:Sq4674)))) 

Sq4674: 

MOVE [1,13], R3 
MOVE R3, 12 

;((:COniIUE-TEST (:PR1HB (:B1SE3) :SUSPEISIVB) (:LITER1L (tSYHBOL :Sq-S)))) 
DC {Sq_S_msg_r«f> 

MOVE 3, Rl 
CALL CITT.VECTOR 

; ((:COniIUB-TEST (:PRIME (:B1SE8) :SUSPEISIVE) (:LITER1L (:SYMBOL :Sq-8)))) 
DC {Sq_8_msg_r«f> 

MOVE 8, Rl 
CILL CITT.VECTOR 

;((:L1BEL (:LITER1L (:SYMBOL :EID-IF-4)))) 

EID_IP_4: 

HOVE [1,13], R3 
HOVE R3, 12 

;((:TERHII1TE)) 
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SUSPEID 


;((:L1BEL (:LITERAL (tSYHBOL :Sq-8)))} 

Sq_8; 

MOVE [1.A3]. S3 
KOVE R3. A3 

;((:TEST-1 (:FRAME (:BASE 8) :SUSPEISIVE) (:REGISTER 0))) 

SUSPEISIVE4741: 

MOVE Cl.A3], R3 
MOVE R3, A2 

DC ■[SUSPEISIVE4741jug_r«f> 

STAG [8,AS], R3 
MOVE tru«, R3 
MOVE R3, CO,AO] 

;((:SETURI-COITEXT (:FRAME (:BASE 10) :SUSPEISIVE) (:FRAME (:BASE 11}))) 
SUSPEISIVE474e: 

MOVE Cl.A3]. R3 
MOVE R3. A2 

DC ■CSUSPEBSIVE474e^sg_r«f> 

STAG ClO,A2]. S3 
MOVE 11, R1 
CALL LOOKUP.VECTOR 
MOVE tru*. R3 
MOVE R3, Cll,A3] 

:((:MOVE-IDEITITY (:FRAME (:BASE 11)) (:FRAME (:BASE 7)))) 

MOVE 7, R1 
CALL LOOKUP.VECTOR 
MOVE C11.A2]. R3 
MOVE R3. C7,A3] 

;((:TERMIIATE)) 

SUSPEID 

;((:LABEL (:LITERAL (:SYMBOL :Sq-E)))) 

Sq.E: 

MOVE Cl.A3]. R3 
MOVE R3, A3 

;((:• (:FRAME (:BASE 3) :SUSPEISIVE) (:FRAME (;BASE 9) :SUSPEISIVE) 

; (iFRAME (:BASE 13)))) 

SUSPEISIVE47E3: 

MOVE Cl,A3]. R3 
MOVE R3, A2 

DC {SUSPEISIVE47E3jusg_r«f> 

STAG C3.A2], R3 
STAG C9,A2], R3 
MOVE 13. R1 
CALL LOOKUP.VECTOR 
MOVE C3,A2], S3 
MUL R3. C9,A2], R1 
MOVE Rl, C13,A2] 

:((:MOVE-IDEITITY (:FRAME (:BASE 13)) (:FRAME (:BASE 6)))) 

MOVE 6, Rl 
CALL LOOKUP.VECTOR 
MOVE C13.A3], R3 
MOVE R3, C6,A2] 

;((:TERMIIATE)) 

SUSPEID 

•nd 
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r«f SUSPEISIVE4753_msg_r6f = MSG: (((SUSPEISIVE47B3+FACT_loc)«10))+2 
r»f SUSPEISIVE4746_iii«g_r«f = MSG: ({(SUSPEBSIVE4746+FACT.loc)«10))+2 
r*f SUSPEISIVE4741_iiisg_r*f = MSG: (((SUSPEISIVE4741+FACT_loc)«10))+2 
r«f Sq_8_B»g_r*f = MSG:(((Sq_8+FiCT_loc)«10))+2 
r«f Sq_B_iiisg_rel - MSG: (((Sq_B+FiCT_loo)«10))+2 

r«f SUSPEISIVE4729_nsg.r«f » MSG: (((SUSPEISIVE4729+FlCT_loc)«10))+2 
r»f Sq4674_m«g_r«f = MSG: (((Sq4674+FACT_loc)«10))+2 
r«f SUSPEISIVE4718jiisg_rei = MSG: (((SUSPEBSIVE4718+FACT.loc)«10))+2 
r«f SUSPEISIVE4710_m»g_p«f = MSG: (((SUSPEISIVE4710+FiCT_loc)«10))+2 
r«f SUSPEISIVE4702_msg_r»f « MSG: (((SUSPEISIVE4702+FACT_loc)«10))+2 
r»f SUSPEISIVE469B_iiisg_r«f » MSG: (((SUSPEISIVE469S+FiCT.loc)«10))+2 
r»f SUSPEISIVE4689_mBg_p«f = MSG: (((SUSPEISIVE4689+FiCT_loc)«10))+2 
ref SUSPEISIVE4683_m8g_ref = MSG: (((SUSPEISIVE4683+FiCT_loc)«10))+2 
r«f Sq_ll_m«g_r«f = MSG: (((Sq_ll+riCT_loc)«10))+2 
ref Sq_2_iiisg_ref = MSG: (((Sq_2+FiCT_loc)«10))+2 
ref EID_IF_4_ip_ref = IP: (((EED_IF_4+FACT_loc)«10))+lBSOLUTE 
ref ELSE_4_ip_ref = IP: (((ELSE.4+FACT_loc)«10))+ABS0LUTE 
ref FACT.cedeblock.ref = CB: (FACT_loc«16)+lB 
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A.2 MDP Code for Fibonacci 


Bodul* FIB 


;((:L1BEL (;LITEIUL (iSYMBOL :Sq-l)))) 

SQ_1: 

MOVE [l.iS], R3 
MOVE R3. A2 

;((:MOVE (:LITERU. (:C0DE-BL0CE :FIB)) (iFRAME (iBiSE 18)))) 

MOVE 18, El 

CALL LOOKUP.VECTOR 

DC {FIB_eod«block_raf} 

MOVE RO, [18,K2] 

;((:COITIIUE-TEST (:FRAME (:BASE3) :SUSPEISIVE) (:LITERAL (:SYMBOL :Sq-2)))) 

DC <Sq_2_»»g_rrf> 

MOVE 3, R1 
CALL CITT_VECTOR 

j((:COmiUE-TEST (:FRAME (:BASE 0) :SUSPEISIVE) (iLITERAL (iSYMBOL :Sq-17)))) 

DC {Sq.l7_iii«g_r«f> 

MOVE 0, R1 
CALL CITT.VECTOR 

;((:LABEL (:LITERAL (:SYMBOL :SEID-RESULT-0)))) 

SEID.RESULT.O: 

MOVE [1,A3], R3 
MOVE R3, A2 

;((:HOVE-REMOTE (:FRAME (:BASE 0) :SUSPEHSIVE) 

; (:LITERAL (iIITEGER 1)) 

i (:FRAME (:BASE 6) iSUSPEISIVE))) 

SUSPEISIVE2503: 

MOVE [1,A3], R3 
MOVE R3, A2 

DC •CSUSPEISIVE2603jQsg_ref> 

RTAG [0,A2], R3 
RTAG [6,A2], R3 
SEIDO [0,A2] 

DC ■CLOCAL.MOVR.insg.raO 
SEIDO RO 
SEIDO [0,A2] 

SEIDO 1 
SEIDEO C6,A2] 

;((:MOVE-REMOTE (:FRAME (:BASEO)) (:LITERAL (:IITEGER 0)) (iFRAME (:BASE B) :SUSPEBSIVE))) 
SUSPEISIVE2E09: 

MOVE [1,A3], R3 
MOVE R3, A2 

DC ■CSUSPEISIVE2509.liisg_Tef} 

RTAG [E,A23, R3 
SEIDO [0,A2] 

DC {LOCAL_MOVR_iiisg.r*f> 

SEIDO RO 
SEIDO [0,A2] 

SEIDO 0 
SEIDEO [E,A2] 

;((:TERMIIATE)) 
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SUSPEID 


;((:L1BEL (iLITERlL (:SY1IB0L :Sq-17)))) 

SQ.lTi 

HOVE [1.13], E3 
HOVE R3. 12 

;((:TEST-2 (:FR1HE (:B1SE 0) :SUSPEISIVE) (:FK1HE (:B1SE 7} :SUSPEISIVE) (;FR1HE (:B1SE E)))} 
SUSPEISIVE2E1E: 

HOVE [1,13], 113 
HOVE R3. 12 

DC ■CSUSPEISIVE2ElS_BSg_r*f> 

RTIG [0,12], R3 
RT16 [7,12], R3 
HOVE 5, R1 
CILL LOOKUP.VECTOR 
HOVE tm«. R3 
HOVE R3, [E.12] 

;((:TERHII1TE)) 

SUSPEID 

;((:L1BEL (:LITER1L (:SYHBOL :Sq-2)))) 

Sq_2: 

HOVE [1,13], R3 
HOVE R3, 12 

;((:<> (:FR1HE (:B1SE 3) :SUSPEISIVE) (:LITER1L (:IITEGER1)) (iFRlHE (:B1SE4)))) 
SUSPEISIVE2E22: 

HOVE [1,13], R3 
HOVE R3. 12 

DC {SUSPEISIVE2E22.]Ug_r«f} 

RTIG [3,12], R3 
HOVE 4, R1 
CILL LOOKUP.VECTOR 
HOVE [3,12], R3 
LE R3, 1, R2 
HOVE R2, [4,12] 

;((:BR1ICH-F1LSK (:FR1HE (iBlSE 4)) (:LITER1L (tSYHBOL :ELSE-4)))) 

HOVE [4,12], R3 
BT R3, 2 

DC {ELSE_4.ip_r«f> 

HOVE RO, IP 


;((:HOVE-IDEmTY (:FR1HE (:B1SE 3) :SUSPEISIVE) (:FR1HE C:B1SE6)))) 
SUSPEISIVE2S30: 

HOVE [1,13], R3 
HOVE R3, 12 

DC ■[SUSPEISIVE2E30.nsg_raf> 

RTIG [3,12], R3 
HOVE 6. R1 
CILL LOOKUP.VECTOR 
HOVE [3,12], R3 
HOVE R3, [6,12] 

;((:HOVE-IDEITITY (:FR1HE (:B1SE 4)) (:FR1HE (:B1SE 7)))) 

HOVE 7, R1 
CILL LOOKUP.VECTOR 
HOVE [4,12], R3 
HOVE R3, [7,12] 

;((:BR1ICH (:LITER1L (:SYHBOL :EID-IF-4)))) 
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DC {EID_IF_4_ip_r»f> 

MOVE RO. IP 

;((:LABE1. (sLITERAL (:SYMBOL :ELSE-4)))) 

ELSE_4: 

MOVE [1.A3]. R3 
MOVE R3. A3 

(:FRAME (:BASE 3) :SUSPEISIVE) (:LITERAL (iIITEGER D) (:FRAME (:BASE 20)}}) 
SUSPEISIVE3E38: 

MOVE [1.A3], R3 
MOVE R3, A2 

DC ■CSUSPEISIVE2S38_BSg_r«f> 

RTAO [3,AS]. R3 
MOVE 30, R1 
CALL LOOKUP.VECTOR 
MOVE [3,AS], R3 
SUB R3, 1, R2 
MOVE R2, [30,AS] 

(:FRAME (:BASE 3} sSUSPEISIVE) (:LITERAL (:IITEGER 2}} (:FRAME (:BASE 19}}}} 
SUSPEISIVESE44: 

MOVE [1,A3], R3 
MOVE R3. A2 

DC {SUSPEISIVE2E44_msg_r«f} 

RTAG [3,AS], R3 
MOVE 19, R1 
CALL LOOKUP.VECTOR 
MOVE [3,A2], R3 
SUB R3, 2, R2 
MOVE R2, [19.A2] 

;((:COITIIUE (:LITERAL (:SYMBOL ttSOSASO}}}} 

MOVE HR, R3 
SEIDO R3 

DC {SQ2490jiiBg_r«f> 

SEIDO RO 
SEIDEO AS 

;((:GET-COITEZT (:FRAME (:BASE 18}} (:FRAME (:BASE 12}}}} 

MOVE AS, R3 
HTAG R3, IIT, R3 
LSH R3, 6, R3 
MOVE HR, R2 
ADD R3, R2, R3 
WTAG R3, FD, R3 
; mak* np dastination 
moT* [19,A2],R1 
or Rl, R2. RO 
and Rl, R2, R3 
add RO, R2. Rl 
HOT* 31, RO 
and Rl, RO, Rl 
aandO Rl 
; sondO 1 

DC {LOCAL_GETC_iii»g_rof> 

SEIDO RO 
SEIDO [18,AS] 

SEIDO R3 
SEIDEO 12 

j((:SPECIAL-TEST-1 (:FRAME (:BASE 12}}}} 

SUSPEISIVE2EEE: 

MOVE [1,A3], R3 
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MOVE R3, 12 

DC {SUSPEISIVE2EEE_ug_r*f> 

RT16 [12,13]. >13 

;((:IIDEZ-CUIUlEIT-COITEXT (:LITER1L (;B1SE 10)) (:REGISTER 0))) 

HOVE 12. R3 
HTIG R3. IIT, R3 
MOVE RO, R2 
DC 10240 
IDD R3, RO, R3 
LSH R3, 6, R3 
HOVE HR, R1 
IDD R3. Rl. R3 
HTIG R3, FD, R3 
HOVE R3. [0,10] 

;((:MOVE-REHOTE (;FR1ME (:B1SE 12)) (:LITER1L (iIITEGERO)) (:REGISTER 0))) 

SEIDO [12,12] 

DC ■[LOClL_MOVR_ws_r«f} 

SEIDO RO 
SEIDO [12,12] 

SEIDO 0 
SEIDEO [0,10] 

;((:MOVE-REMOTE (:FR1ME (:B1SE 12)) (:LITER1L (iIITEGER 3)) (:FR1HE (:B1SE 19)))) 
SEIDO [12,12] 

DC {LOClL_HOVR_iug_raf} 

SEIDO RO 
SEIDO [12,12] 

SEIDO 3 
SEIDEO [19,12] 

;((:TERMII1TE)) 

SUSPEID 

;((:L1BEL (;LITER1L (:SYMBOL t:Sq2490)))) 

Sq2490: 

HOVE [1,13], R3 
MOVE R3. 12 

:((:COITIIUE-TEST (iFRlHE (:B1SE 10) :SUSPEISIVE) (:LITER1L (:SYHBOL iSQ-8)))) 

DC '[Sq_8_msg_r«f} 

MOVE 10, Rl 
CILL CITT.VECTOR 

;((:COITIIUE (:LITER1L (;SYMBOL t:Sq2494)))) 

MOVE HR, R3 
SEIDO R3 

DC {Sq2494_msg_ref> 

SEIDO RO 
SEIDEO 12 

;((:6ET-C0ITEZT (:FR1HB (:B1SE 18)) (:FR1HE (:B1SE IE)))) 

HOVE 12, R3 
HTIG R3, IIT, R3 
LSH R3, 6, R3 
HOVE HR, R2 
IDD R3, R2, R3 
HTIG R3, FD, R3 
; maka up dastination 
mova [20,12],R1 
or Rl, R2. RO 
and Rl, R2, R2 
add RO. R2. Rl 
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HOT* 31, RO 
and Rl. RO, R1 
■•ndO Rl 

; SEIDO 1 

DC ■CLOClL.GETC.BSg.raf} 

SEIDO RO 
SEIDO [18,A2] 

SEIDO R3 
SEIDEO 16 

;((:SPECI1L-TEST-1 (:FRAHE (:BASE 16)))) 

SUSPEISIVE2670: 

MOVE [1,A3], R3 
HOVE R3, 12 

DC •CSUSPEISIVE2670_iiisg_ref} 

RTIG [16,12], R3 

;((:IIDEZ-CURREIT-COITEZT (:LITER1L (:B1SE 13)) (:REGISTER 0))) 

HOVE 12, R3 
HT16 R3, IIT, R3 
MOVE RO, Rl 
DC 13312 
IDD R3, RO, R3 
LSI R3, 6, R3 
HOVE HR, RO 
IDD R3, RO, R3 
WTIG R3, FD, R3 
HOVE R3, [0,10] 

;((:HOVE-REHOTE (:FR1HE (:B1SE 16)) (:LITER1L (:IITEGER 0)) (:REGISTER 0))) 

SEIDO [16,12] 

DC ■aOClL_MOVR_iiisg_ref> 

SEIDO RO 
SEIDO [16,12] 

SEIDO 0 
SEIDEO [0,10] 

:((:MOVE-SEHOTE (:FR1ME (:B1SE 16)) (iLITERlL (tllTEGERS)) (:FR1HE (:B1SE 20)))) 
SEIDO [16,12] 

DC •[LOClL_MOVR_iiisg_T*f} 

SEIDO RO 
SEIDO [16,12] 

SEIDO 3 
SEIDEO [20,12] 

;((:TERHII1TE)) 

SUSPEID 

;((:L1BEL (:LITERAL (:SYMBOL *:Sq2494)))) 

Sq2494: 

HOVE [1,13], R3 
HOVE R3, 12 

;((:COITIIUE-TEST (:FR1HE (:B1SE 14) :SUSPEISIVE) (:LITBR1L (:SYHBOL :Sq-12)))) 

DC {Sq_12_iiisg_r«f> 

HOVE 14, Rl 
CALL CITT.VECTOR 

;((:COITIIUE-TEST (:FR1HE (:B1SE 13) iSUSPEISIVE) (:LITER1L (:SYHBOL :Sq-13)))) 
DC ■[Sq_13_iug_r«f> 

HOVE 13, Rl 
CALL CITT.VECTOR 

;((:COITIIUE-TEST (:FR1HE (:B1SE8) iSUSPEISIVE) (:LITER1L (:SYHBOL :Sq-14)))) 
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DC ■CSq_14_iisg_r«f> 

HOVE 8. R1 
CALL CITT.VECTOR 

;((:LABEL (:LITERAL (:SYMBOL :EI0-IF-4)))) 

EID_IF_4: 

HOVE [1,A3]. R3 
HOVE R3. A2 

;((:TERHIIATE}} 

SUSPEID 

;((;LABEL (iLITERAL (:SYNBOL :Sq-14)))) 

Sq_14i 

MOVE [1,A3], R3 
MOVE R3, A2 

;((:TEST-2 (;FRAME (:BASE 8) tSUSPEISIVE) (:FRAME (:BASE 9) :SUSPEISIVE) (:FRAME (:BASE 16)))) 
SUSPEISIVE2S82: 

MOVE [1,A3], R3 
MOVE R3, A2 

DC -[SUSPEISIVE2S82_iisg_raf} 

RTA6 C8,A2], R3 
RTAG [9.A2], R3 
MOVE 16. R1 
CALL LOOKUP.VECTOR 
HOVE txu*. R3 
MOVE R3, [16.A2] 

;((:MOVE-IDEITITY (:FRAME (;BASE 16)) (:FRAME (:BASE 7)))) 

ROVE 7. R1 
CALL LOOKUP.VECTOR 
HOVE [16.A2]. R3 
HOVE R3. [7.A2] 

;((:TERMIIATE)) 

SUSPEID 

;((:LABEL (:LITERAL (:SYMBOL :Sq-13)))) 

Sq_13: 

MOVE [1.A3]. R3 
MOVE R3. A2 

;((:TEST-1 (:FRAME (:BASE 13) :SUSPEISIVE) (:REGISTER 0))) 

SDSPEISIVE2690: 

MOVE [1.A3]. R3 
HOVE R3. A2 

DC ■[SUSPEISIVE2590.msg_r«l> 

RTAG [13.A2]. R3 
MOVE tru*. R3 
MOVE R3. [O.AO] 

;((:RETURI-COITEZT (:FRAHE (sBASE IE) :SUSPEISIVE) (:FRAHE (:BASE 8)))) 

SUSPEISIVE2595: 

HOVE [1,A3]. R3 
HOVE R3. A2 

DC ■[SUSPEISIVE2E9Ejiisg_raf> 

RTAG [16.A2]. R3 
MOVE 8. R1 
CALL LOOKUP.VECTOR 
MOVE true. R3 
MOVE R3. [8.A2] 

;((:TERHIIATE)) 
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SUSPEID 


;((:L1BEL (:LITERiL (:SYMBQL :Sq-12)))) 

SQ.12: 

MOVE [1,13], R3 
MOVE R3, 12 

;((:+ (:FR1ME (:B1SE 14) :SUSPEISIVE) (:FR1ME (:B1SE 11) :SUSPEISIVE) (:FR1HE (:B1SE 17)))) 
SUSPEISIVE2601: 

MOVE [1,13], R3 
MOVE R3, 12 

DC -CSUSPEISIVE2601_iiisg_raf> 

RTIG [14,12], R3 
RTIG [11,12], R3 
MOVE 17, R1 
CILL LOOIUP.VECTOR 
MOVE [14,12], R3 
IDD R3, [11,12], RO 
MOVE RO, [17,12] 

;((:MOVE-IDEITITY (:FR1ME (:B1SE 17)) (:FR1ME (:B1SE 6)))) 

MOVE 6, R1 
CILL LOOKUP.VECTOR 
MOVE [17,12], R3 
MOVE R3, [6,12] 

;((:TERMII1TE)) 

SUSPEID 

;((:L1BEL (rLITERlL (:SYMBOL :SQ-8)))) 

sq.s: 

MOVE [1,13], R3 
MOVE R3, 12 

;((:TEST-1 (:FR1ME (:B1SE 10) :SUSPENSIVE) (:REGISTER 0))) 

SUSPEISIVE2610: 

ROVE [1,13], R3 
ROVE R3, 12 

DC {SUSPEISIVE2610_msg_r«f> 

RTIG [10,12], R3 
HOVE true, R3 
HOVE R3, [0,10] 

;((iRETURI-COITEXT (:FR1HB (:B1SE 12) :SUSPEISIVE) (:FR1HE (:B1SE 9)))) 

SUSPEISIVE261E: 

HOVE [1,13], R3 
HOVE R3, 12 

DC '[SUSPEISIVE261E_iBSg.ref} 

RTIG [12,12], R3 
MOVE 9, R1 
CILL LOOKUP.VECTOR 
MOVE true, R3 
MOVE R3, [9,12] 

;((:TERMII1TE)) 

SUSPEID 

•nd 

rei SUSPEISIVE261EjESg_ref = MSG: (((SUSPEISIVE261E+FIB_loc)«10))+2 
ref SUSPEISIVE2610jiisg_rei = MSG: (((SUSPEISIVE2610+FIB.loc)«10))+2 
ref SUSPEISIVE2601jiisg_ref = MSG: (((SUSPEISIVE2601+FIB_loc)«10))+2 
ref SUSPEISIVE2S9Ejnsg_ref = MSG: (((SUSPEISIVE2E9E+FIB_loc)«10))+2 
ref SUSPEISIVE2E90jasg_ref = MSG: (((SUSPEISIVE2E90+FIB_loc)«10))+2 
ref SUSPEISIVE2E82jnsg_ref = MSG: (((SUSPEISIVE2E82+FIB.loc)«10))+2 
ref Sq_14_msg_ref = MSG: (((Sq_14+FIB_loc)«10))+2 
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r«f Sq_13_m86_r«f = HSG: ({(SQ_13+FIB_loc)«10))+2 
r»f Sq.l2_m8g_r«f « MSG: (((Sq_12+FIB_loc)«10))+2 

ref SUSPEISIVE2B70_iii»g_r«f « MSG: (((SUSPEISIVE2B70+FIB_loc)«10))+2 
ref Sq2494_iiisg_r«f = MSG: (((sq2494+FIB_loc)«10))+2 
ref Sq_8_iiisg_r«f = MSG: (((Sq_8+FIB_loc)«10))+2 

ref SUSPEESIVE2BBB_m8g_ref = MSG: (((SUSPEIISIVE2BBB+FIB_loc)«10))+2 
ref Sq2490_m8g_ref = MSG: (((Sq2490+FIB_loc)«10))+2 
ref SUSPEISIVE2B44jtt8g_ref = MSG: (((SUSPEISIVE2B44+FIB_loc)«10) )+2 
ref SUSPEISIVE2B38_iii8g_ref ■ MSG: (((SUSPEISIVE2B38+FIB_loc)«10))+2 
ref SUSPEISIVE2B30_m8g.ref = MSG: (((SUSPEISIVE2B30+FIB_loc)«10))+2 
ref SUSPEISIVE2B22_m8g_ref = MSG: (((SUSPEISIVE2B22+FIB_loc)«10))+2 
ref SUSPEISIVE2BlB_m8g_ref » MSG: (((SUSPEISIVE2BlB+FIB_loc)«10))+2 
ref SUSPEISIVE2B09_iii8g_ref = MSG: (((SUSPEISIVE2B09+FIB_loe)«10))+2 
ref SUSPEISIVE2S03^g_ref = MSG: (((SUSPEISIVE2G03-fFIB_loc)«10))+2 
ref Sq_17_m8g_ref = MSG: (((Sq_17+FIB_loc)«10))+2 
ref Sq_2_i88g_ref » MSG: (((sq_2+FIB_loc)«10))+2 
ref EID_IF_4_ip_ref » IP: (((EBD_IF_4+FIB_loc)«10))+lBS0LOTE 
ref ELSE_4_ip_ref = IP: (((ELSE_4+FIB_loc)«10))+ABS0LUTE 
ref FIB_codebloqk.ref = CB: (FIB_loc«16)+21 
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A.3 MDP Code for Loop Example 


This is a rsvrits of tho loop program with non'-Iannucci structuros. 
Instoad of hawixig: 


/->i i<->i i<->i i<—\ 
\-/ 


us* a setup sh*r* difl*r*nt iterations iteration pointers are 
contiguous: 


(Think of as ”-1") 


(Think of as "K") 


; Tho nessag* format sill be: 

; MSG:location of cod* 

; IDDR:frame base 

; ADDR:location of iteration pointer 

; 12 sill be loaded sith the frame base. 

; 11 sill be loaded sith the base of the subframe 

; Times (including proc call overhead) 

; arg k time 

; 0 2 32B 

; 1 2 460 

; 10 2 167B 

; 1 2 32B+13B*! 

; 0 3 330 

; 10 6 1690 

: 1 K 32B-fl3B*l+B*K 


label LIBR1RY_PL1C&=«180 

label frame.sis* • 9 
label frame_n_iteration_slots = 6 
label argument = 10 
label k = B 


nonaal frame 


ptr to it. K-1 
ptr to it. 0 
ptr to it. 1 

ptr to it. K-1 
ptr to it. 0 


subframe 0 


subframe 1 


subframe K-1 
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label total.frama.cisa » fraaa.sixa + k * (fram«_n_it«ratioii_slots 1) + 2 
label elotlD « total.fraae.aiae - 1 
label slotK - 2 

include “libS.mdp" 

;; Prograa code 

label CODE_PUCE°$400 

module prograB_code 

; ((:UBEL (:LITERAL (:SYMBOL :Sq-l))) 

»q_l: 

moTe C1.A3], RO 

more RO, 12 

; Altered order is a temporary kludge 

; (:CITT (:FRAMB (:BASE 3) 

; ;SUSPEISIVE) 

; (:LITERAL (:SYMBOL :Sq-4))) 

DC ■[8q_4_msg_ref> 

move 3, R1 

call CITT 

; (:CITT (:FRAME (:BASE 0) 

; :SUSPEISIVE) 

; (:LITERAL (:SYMBOL :Sq-3))) 

DC <sq_3_meg_ref> 

move 0, R1 

call CITT 

move ip, RO 

mote RO, [0,A3] 

; (:LABEL (:LITERAL (:SYMBOL :SEBD-RESULT-0))) 

s and_result _0: 

mote [1,A3], RO 

move RO, A2 

; (:MOVR (:FRAME (:BASE 0) 

; :SUSPEISIVE) 

; (:LITERAL (:IITEGER 1)) 

; (:FRAMB (:BASE 7) 

; :SUSPEISIVE)) 

DC {local_moTr_msg_ref] 

move [7,A2], R1 

send2 C0,A2], RO. 0 

send [0,A2] , 0 

send2e 1, Rl, 0 

: move ip, rO 

move rO, [0,A3] 

move [1,A3], RO 

move RO, A2 

; (:MOVR (:FRAMB (:BASE 0)) 

; (:LITERAL (;IITEGER 0)) 

; (:FRAME (:BASE 6) 

; :SUSPEISIVE)) 

DC {local_movr_msg_ref] 

move CB,A2], Rl 
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■•nd3 [0.12], RO, 0 

send [0,12], 0 

s«nd2s 0, Rl, 0 

(;TERMII1TE) 

suspsnd 

(:L1BEL (:LITBRAL <:SYMBOL :SQ-4))) 

MOT* [1,13], RO 
more RO, 12 

(:MOVE (:FR1HE (:B1SE 3) 

:SUSPEISIVE} 

(:FR1ME (:B1SE 4))) 
moTS 4, Rl 
call LOOKUP 
moTS [3,12], Rl 

more Rl, [4,12] 

(:SUB (;FR1NE (:B1SB 2)) 

(:LITER1L (:IITEGER 2)) 

(:REGISTER 0)) 

(:MOVE (:LITER1L (;IITEGER 9)) 
(:REGISTER 1)) 

(:SUB (:REGISTER 1) 

(;LITER1L (:IITEGER 6)) 

(:REGISTER 2)) 

(:1DD (:REGISTER 1) 

(:LITERIL (:IITEGER 6)) 

(:REGISTER 3)> 

(:STPR (iREGISTER 2)) 

(:STCR (:REGISTER 1)) 

(:STIZ (:REGISTER 3)) 


Put bass ot ID memory into 11 


DC 

IIT:frama_size<<S7S_: 

more 

12, R2 

stag 

R2, IIT, R2 

add 

R2, RO, R2 

stag 

R2. IDDR, R2 

moTO 

R2. 11 

: Put 

k into Rl 

move 

[sIotK,12], Rl 

sub 

Rl, 1. Rl 

; Put 

base offset into R2 

mote 

2-l-frame_size, R2 

add 

R2. [2,12], R2 

; In 

this setup, 0 through k- 

! 

-1, k-1, k I 

; Loop throu^ 

DC 

IIT:maskIN 

or 

R2, RO, R2 

moTo 

1, R3 
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loop_s«tup_loop: 

moT* 112, [R3,il] 

It R3, Rl, RO 

add R2, fraii«_a_itaratioii_8lots, R2 

add R3, 1. R3 

bt RO, *loop_s8tup_loop 


; Stor« 

k-l slot, etc. 

DC 

IIT:*(maskIM 1 aaskPC) 

mud 

R2, RO, R2 

moT« 

R2, [R3,A1] ; k-l slot 

moT« 

R2, [0,A1] 

mow 

[1,A1], R2 

and 

R2, RO, R2 

add 

R3, 1, R3 

mofo 

R2, [R3,A1] 


;; (:UBEL (:LITERAL (:SYMBOL : SETUP-LOOP-6) ) ) 

satup_loop_E: 


(:BRZ (:REGISTER 0) 

;; (:LITERRL (:SYMBOL :EID-SETUP-LOOP-E)}) 

;; (iIZID (:LITERRL (iIITEGER 6)) 

;; (:PRIME (:IEZT-ITERZTIOI 0))) 

;; (:SUB (:REGISTER 0) 

;; (:LITERAL (:IITEGER 1)) 

;; (:REGISTER 0)) 

;i (:BR (iLITERIL (rSYMBOL :SETUP-LOOP-B))) 

;; (:LIBEL (:LITERIL (:SYMBOL :EID-SETUP-L00P-5))) 

and_satup_loop_6: 

;; (:STIZ (:REGISTER 2)) 

;; (:IZI0 (:LITERAL (:IITEGER 6)) 

;; (:REGISTER 4)) 

;; (;IZIO (:LITERAL (:IITEGER 0)) 

;; (:FRAME (:ITERATIOI 0))) 

; (:STIM (:FRAME (:ITERATIOI 0)} 

; (iLITERAL (:BOOLEAI :FALSE)) 

; (:FRAME (:ITERATIOI 0))) 

; In na« scbama, this maans set -I’s to zero. Done above. 

;; (:IZID (:LITERAL (:IITEGER 6)) 

;; (;REGISTER 4)) 

;; (:IIID (:LITERAL (;IITEGER 0)) 

;; (:FRAME (:ITERATIOI 0))) 

;; (:STIZ (:REGISTER 1)) 

; (;STPC (:FRAME (:IEZT-ITERATIOI 0)) 

; (:LITERAL (:SYMBOL :ITERATE-6)) 

; (:FRAME (:IEZT-ITERATIOI 0))) 

DC IIT:maskPC 
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moT* 

call 


1, R1 

CHECK_ITER 


(:TERHIIRTB) 

suspand 


(sLlBEL (:LITERiL (:SYMBOL :ITERRTE-E))) 


•-5: 

moT« 

[1,A3], R1 

moT« 

Rl, A2 

wtag 

Rl, IIT, Rl 

moT« 

[2,A3], R2 

mdd 

R2, frame.size-t’l 


CR2,A2], R2 

Ish 

R2, S7S_len_bits 

add 

Rl, R2, Rl 

wtag 

Rl, ADDR, Rl 

moT« 

Rl, A1 


R2 

Offset to bass of cur loop subframs nos in R2 
R2 


Bass of cur loop subfrans in 11 


(:HOVE (:FRAME (:ITERATIOI 1) 

tIOISTICKY :SUSPEISIVE) 

(:FRAME (iITERATIOI 6))) 

; offset S is internal to loop and need not be checked 
nose [1,A1], RO 

move RO, CE,A1] 

DC CFUT:$0 

move RO, [1,A1] 

(:LE (:FRAME (tITERATIOI E)) 

(:FRAME (:BASE 4)) 

(:FRAME (:ITERATIOI 4))) 
move CE,A1], RO 

le RO, [4,A2], RO 

move RO, C4,A1] 


(:BRF (:FRAME (:ITERATIOI 4)) 

(:LITERAL (:SYMBOL :EID-L00P-E))) 
move [4,A1], RO 

bf RO, *end_loop_E 

(:STPC (:FRAME (:IEXT-ITERATIOI 0)) 

(:LITERAL (:SYMBOL :ITERATE-S)) 
(:FRAME (:IEZT-ITERATIOI 0))) 

DC IIT:maskPC 

move 1, R1 

call CHECK.ITER 


(:ADD (:FRAME (:ITERATIOR E)) 

(: LITERAL (:IITEGER D) 

(:FRAME (:lEXT-ITERATIOI 1))) 


mov 

[2,A3], Rl 


add 

Rl» frama.siza+l'*’!, Rl ; naxt 

moT* 

CR1,A2], Rl 


add 

Rl, 1, Rl 

; offset 1 

DC 

IIT;*(maskIM I 

1 maskPC) 

and 

Rl, RO, Rl 


call 

LOOKUP 


moT« 

[E,A1], RO 


add 

RO, 1, RO 

; integer 1 

moTa 

RO, [R1,A2] 
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(:1DD (-.FRAHE (:ITEIUTIOI 2) 

:IOISTICKY sSUSPEISIVE) 
(sFUllE (:ITERATIOI E)) 

(:FRANE (sIEZT-ITEUTIOI 2))) 
rtag [2,A1], RO 

add Rl, 2-1, R1 

call LOOKUP 

HOT* [2.A1]. RO 

add RO, [E,A1], RO 

moT* RO, CR1,A2] 

DC CFUT:tO 

moT* RO, [2,A1] 


moT« 

ip, RO 


aoT« 

RO, [0,A3] 


mOT6 

Cl,A3], Rl 


moT« 

Rl, A2 


wtag 

Rl, IIT, Rl 


aoT« 

[2,A3], R2 


add 

R2, frsBa.siza+l, 

R2 

aoTa 

CR2,A2], R2 ; 

Offset to base of cur loop subfxaBO not in R2 

Ish 

R2, S7S_lan_bits, 

R2 

add 

Rl, R2, Rl 


wtag 

Rl, ADDR, Rl 


moT« 

Rl. il i 

Base of cur loop subfraBO in A1 


(:TST1 (;FRAME (:ITERATIOI 3) 

:IOISTICKY tSUSPEKSIVE) 
(:FRAHE (:IEXT-ITERATIQI 3))) 
ptag [3,A1], RO 

DC CFUT:$0 

moTC RO, [3,A1] 

iDOPa [2,A3], Rl 

add Rl, fraaia_siz«+l+l, Rl ; nazt 

mova [R1,A2], Rl 

add Rl, 3, Rl ; offsat 3 

DC IIT:'(maskIM I maskPC) 

and Rl, RO, Rl 

call LOOKUP 

Dova tma, RO 

mota RO, [R1,A2] 


; (:STIH (:FRAME (tPREVIOUS-ITERATIOl 0)) 

; (:LITERAL (:BOOLEAI :TRUE)) 

; (:FRAHE (iPREVIOUS-ITERATIOI 0))) 

DC IIT:maskIM 

BOTa -1, Rl 

call CHECK.ITER 

; (:TERMIIATE) 

suspend 

; (;LABEL (:LITERAL (rSYMBOL :EID-LOOP-E))) 

and_loop_E: 

BOTa [1,A3], Rl 

BOTa Rl, A2 
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stag 

Rl, IIT, R1 


aoTs 

[2,A3], R2 


add 

R2, fraaa.sise+l. 

R2 

more 

[R2,A2], R2 ; 

Offset to base of cur loop subframe nos in R2 

Ish 

R2, sjs_lan_bits. 

R2 

add 

Rl, R2, Rl 


stag 

Rl, ADDR, Rl 


aoTS 

Rl. Al : 

Base of cur loop subframe in Al 


; (:CITT (:FRU(E (sITEElTIOI 2) 

; :SUSPEISIVE) 

; (:LITER1L (:SYHBOL :COPY-LOOP-VARIRBLE-l))) 

DC {cop 7 _loop_Taxiabla_l_iiisg_r«f}' 

aoT* 2, R1 

call CITT.L00P 

; (:NOVE (:FR1ME (:ITER1TI0I E)) 

i (:FR1HE (:BASE 6))) 

aoTC 6, R1 

call LOOKUP 

HOT# [E,K1], RO 

moT* RO, [0,12] 

; (iTERHIIlTE) 

suspend 

; (;LIBEL (:LITERAL (:SYMBOL :C0PY-L00P-V1RI1BLE-1))) 

copy_loop_Taxiable-l: 


moT« 

[1.A3]. Rl 


moTe 

Rl, A2 


vtag 

Rl, IIT. Rl 


moT« 

[2,13], R2 


add 

R2, frame.size+l, 

R2 

mova 

[R2,A2]. R2 ; 

Offset to baso of cur loop subframe nos in R2 

Ish 

R2, s]rs_len_bits, 

R2 

add 

Rl, R2. Rl 


wtag 

Rl, ADDR, Rl 


moT« 

Rl. Al : 

Base of cur loop subfrasie in 11 


(:CITT (:FRAME (;ITERATIOI 3) 

:SUSPEISIVE) 

(tLITERAL (:SYMBOL :C0PY-L00P-VARIABLE-2))) 
DC {cop7_loop_Tariabla_2_msg_ref} 


move 

3, Rl 

call 

CITT_L00P 


move 

ip, RO 


moTe 

RO, [0,A3] 


move 

[1,A3], Rl 


noTe 

Rl, A2 


stag 

Rl, IIT, Rl 


noTe 

[2,A3], R2 


add 

R2, frame.size-fl, 

R2 

more 

[R2,A2], R2 ; 

Offset to base of cur loop subframe nos in R2 

Ish 

R2, S 7 s_len_bits, 

R2 

add 

Rl, R2, Rl 


stag 

Rl, ADDR. Rl 


moTe 

Rl. Al j 

Base of cur loop subframe in Al 


90 





; (:NOVE (:FUHE (:ITBRATIOI 3) 

; :IOISTICKY :SUSPEISIVE) 

; (:FRAHE (:BASE 7))) 

rtag [2,A1], RO 

moT« 7, R1 

call LOOKUP 

moT# [3,A1], RO 

BOTa RO, [7,A3] 

DC CFUT:$0 

BOT* RO, [3,A1] 

; (:TERHIIATE) 

suspaad 

; (:LABEL (:LITERAL (:SYMBOL :COPY-LOOP-VARIABLE-3})) 

cop7_loop_TaTiabla_3: 

BOT# [1,A3] , R1 

BOTa R1, A3 

stag Rl, IIT, R1 

BOTa [3,A3], R3 

add R3, fraBa.siza+l, R2 

BOTa [R3,A3], R3 ; Offaat to basa of cur loop subfraBa now in R2 

Ish R3, STa_lan.bits, R3 

add Rl, R3, Rl 

stag Rl, ADDR, Rl 

BOTa Rl, A1 ; Basa of cur loop subfraBa in A1 

; (:H0VE (:FRAME (:ITERATIOI 3) 

; :IOISTICKY :SUSPEISIVE) 

; (:FRAME (:BASE 8))) 

rtag [3,A3], RO 

BOTa 8, Rl 

call LOOKUP 

BOTa [3,A1], RO 

BOTa RO, [8,A3] 

DC CFUT:$0 

BOTa RO, [3,A1] 


(:MOVE (sLITERAL (tSYMBOL tSIGIAL)) 

(:FRAME (:BASE 6))) 

BOTa trua, RO 

BOTa RO, CE,A3] 

(:TERRIIATE) 

suspand 

(:LABEL (;LITERAL (:SYMB0L :Sq-3))) 

BOTa [1,A3], RO 

BOTa RO, A3 

(:ROVE (:FRARE (:BASE 0) 

:SUSPEISIVE) 

(:FRAME (:IBXT-ITERATIOI 3))) 
rtag [0,A3], RO ; Chack if Talua tbara 

BOTa [3,A3], R3 
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add R2, frama.siza+l+l, R2 

moT« [R2,A2], R2 

add R2. 3, R1 

DC IIT:*(maskIH I maskPC) 

and Rl, RO, R1 

call LOOKUP 

HOT# [0,A2], R2 

■ora R2, [R1.A2] 

; (:H0VE (: LITERAL (-.IITEGER D) 

; (:FRAHE (:IEXT-ITERATIOI 1))) 

sub Rl. 3-1, Rl 

call LOOKUP 

moT* 1. RO 

moT* RO, [R1,A2] 

; (:HOVE (sLITERAL (;IITEGER 0)) 

; (:FRAME (:IEXT-ITERATIOI 2))) 

add Rl. 2-1. Rl 

call LOOKUP 

moTa 0, RO 

moTa RO, [R1.A2] 

; (:TER1IIIATE)) 

suspand 

and 

raf sq_3_iiisg_raf = MSG: ((sq_3+C0DE_PLACE) « sys.lan.bits) + 3 
ral sq_4_msg_raf = MSG: ((sq.A+CODE.PLACE) << sys.lan.bits) + 3 

raf copy.loop.TBriabla.l.msg.raf=MSG:((copy.loop.yariabla.l+CODE.PLACE) « sys.lan.bits) + 3 
raf copy.loop.Tariabla.2.msg.raf=MSG:((copy.loop.Tariabla.2+C0DE.PLACE) << sys.lan.bits) + 3 
raf itarata_E_Bisg.raf = MSG: ((itarata.E+CODE.PLACE) << sys.lan.bits) + 3 
raf loop.aisg.raf a MSG: ((itarata.E+CODE.PLACE) << sys.lan.bits) + 3 
raf LOOP.CB = CB: ((sq.l+C0DE.PLACE)«16) + total.frama.siza 

placa prograiii.coda, CODE.PLACE 

labal TOP.PLACE = $E00 

CO 

;; Top laval coda 
modula top.coda 

;; Craata tba frama 

; First sa must allocata a frama 

DC {topl.raf} 

moTa RO, R3 

DC -Cloop.cb}- 

moTa RO, Rl 

call ALLOCATE 

top.l: 

mora R2, A2 

DC FD:$600«16 ; Hbara to put rasult 

moTO RO, [0.A2] 

moTa argumant, Rl ; Argumant 

mora Rl, [3,A2] 

moTa k, Rl 

moTa Rl, [slotK, A2) ; K 

DC MSG: ((SQ.l+C0DE.PLACE)«oys.lan.bits)+3 
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Appendix B 


MDP Library Code 


B.l General Library 


This fils holds ths library for VIDI progran sxscution on ths J-machins. 

It puts it all in a modnls library.cods. 

It includes and defines as necessary and loads the system call rector 

sith the following (i for input, o for output): 

ALLOCATE (0) - Allocate a frame on the current node given a codeblock 
R1 (i) holds CB. Addr result will be in R2 (o). 

LOOKUP (1) - Check a location in a frame before writing to it 
in order to start any waiting processes. 

[R1.A2] (i) holds data. 

CITT (2) - Same at VIDI CITT. R1 (i) holds test location, 

RO (i) holds HSG to be spawned. 

CALLOC (3) - Allocate the number of words in R1 (i) and 
return the result as an ADDR in R2 (o). 

LOOKUP.ITER (4) - Like LOOKUP, but takes its offset from Al; 

thus it takes a base in Al (i) and an offset in R1 (i). 

CHECK.ITER (S) - A new value for an ID in R1 (i) is put in the 
first spot of Al (i) and starts up the loop if 
the import and PC flags are set. For nos, no 
PC field included in ID. This must be fixed up. 

Various fault handlers are also defined: 

CFUT - Replace accessed location with info about current continuation 
then suspend. 

SEID - Continue after unavoidable delay. 

Additionally, some methods accessed by non-local HSGs are supplied: 

LOCAL.HOVR - Take a HSG of the form: 

FD 

IIT:offsetl 

AIY:valuel 

Eventually, one sill be able to send any number 
of IIT,AIY pairs. This stores the values into 
the offsets of the specified frame. 

LOCAL.GETC - Takes a MSG of the form: 

CB to allocate frame for 
FD to send new FD to 
IIT offset in desination FD 
Locally allocate a frame and sand it back 
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to roquosting nodo. Also start ap cods block 
on cnrront nods. 


labsl FREB.PTK - tAOO 
labsl STiCK.BASE ° tAOl 

; stack spaes sill bs from |a01 np. taOO 
; sill hold ths first fras location (not last nssd). 

; IXCC crsatss (from an ADDR) a frame descriptor FD: 

; 31 ... 16 IE .. 0 

; <addr> HR 

; shere <addr> should bs right-shifted four to be properlp placed. 
> 

; Similarly, a eodeblock is typed CB and is: 

; 31 ... 16 IE ... 0 

; <addr> <frame sise> 

i 

. 7o sumsiarise: 

i 6ETC: CB -> ADDR (allocate_loc) 

; IXCC: ADDR -> FD 

; HOVR: FD z (IIT x AIY)* 

• CITT: ADDR -> ADDR (because it stays on some processor) 


;; J-machine constants 

include "/home/gn/ellens/Id/hs.mdp" 

include "/home/gn/ellens/Id/nesq.mdp" 

label sys_len_bits = 10 

label ABSOLUTE = (1«8) 

label UICHECKED = (1«31) 

;; Constants for loops 

labsl posPrevious = 0 

label maskPresious > fOOOOff 

labsl posCurrent • 8 

label maskCurrent - $ff00 

labsl poslezt » 16 

label masklazt » IffOOOO 

label posIH » 24 

labsl maskIH = l«posIM 

label posPC « 2E 

label maskPC ^ l«posPC 

;; User-defined tags 
tagname 8 "CB” 
tagname 9 "FD" 
tagname 10 "ISA" 

; System calls 

label ALLOCATE - 0 

label ALLOCATE.VECTOR 0 

label LOOKUP = 1 

label LOOKUP.VECTOR = 1 

label CITT - 3 

label CITT.VECTOR ° 2 

label CALLOC - 3 

label CALLOC.VECTOR > 3 

label LOOKUP.ITER > 4 

label LOOKUP.ITER.VECTOR « 4 

label CHBCK.ITER > E 

label CIECK.ITER.VECTOR = E 

namsTsetor ALLOCATE+32, "Allocate" 
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nuMTCCtOT LOOKUP+33, "Lookup'' 
namoToctor CITT+3a, "CITT" 
uuBOTOctor CALLOC+32, "Culloc" 

; Coustants lor ealloe 

; For boat 01110101107 , (ISTRUCT.Q.SIZB - 1) % ISTRUCT.Q.EITRY.SIZE « 0 
labol ISTRUCT.Q.SIZB o 8 
labol ISTRUCT.q.EITRY.SIZE > 2 

modulo llbrary.oodo 

torrlblo; 

bait 0 

br ‘torrlblo 

; In oaso ol olut lault, roplaoo CFUT sltb oontlnuatlon Inlo. 

; Typo obooklng la oturnod oil* tbon thla Intarrupt la ontorad! 

; Vhon ao got boro, [0,13] oltbor oontalna a ralld HS 6 , or 
; It oontalna a IP altb p*l, a^l 
lault.olut.loo: 

moTo RO, IDO ; *•••• 

moro HIR, R1 

; It tbla point, R1 bolda addroaa to atoro polntar In 
lault.olut.nono.allooatod: 

: allooato a trlplo Irom ataok 
DC addr :FREE.PTR«aya.lan.blta 

moTO RO, 11 

mova [0,11], R2 

DC IIT: 3 « 87 a.lon.blt 8 

add R2, RO, R3 

moto R3, [0,11] 

moTO R2, 11 

; R2 and 11 noa point to ompty trlplo 


noT« 

IDO, R3 

; ***** 

f ault.clut.msg. 

okay: 


mova 

R3, [0,11] 


mova 

12, RO 


moTa 

RO, [1,11] 


moTa 

[Rl,10], RO 


sova 

RO, [2,11] 


wtag 

R2, CFUT, R2 

; Writ* tbo trlplo to abaro Rl points 

aora 

R2, [Rl,10] 


suspand 



; R1 la a CB altb Input Inlo. 

; roault alll bo an IDDR In R2. 

Clobbors roglstors (ozoopt 12). 

allocata.loc: 



chaek 

Rl, CB, R2 


bf 

R2, ‘torrlblo 


vtag 

Rl, IIT, Rl 

; lot strictly noodod 

and 

Rl, $1111, Rl 

; Got slzo 

DC 

IDDR: FREE.PTR«sya.lan.blts 

mova 

RO, 11 


moTa 

[0,11], R2 


Ish 

Rl, 8 ys.lan.blta 

, Rl i Sbllt slzo count Into placa 

add 

R2, Rl, Rl 
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noTe 


Rl, [O.Al] 


¥tag R3, ADDR. R3 
moT« lip> ip 

; W« Head enpport for HOVR. Tli* format of tha masaaga ahoiild ba: 

; FO 

i IRTtoffsatl 

; AIYiTalual 

; Tba aombar of itams can ba datanaiaad from tba masaaga baadar. 

; It araat ba 1 (for now) Thia also runs in nnebseksd mods, 
local_moTr: 


mova 

[1,A3], Rl : Put 

framo doscriptor into Rl 

ebsek 

Rl, FD, RO 


bf 

RO, ‘tarribla 


utag 

Rl, IIT, Rl 


Isb 

Rl, -18, Rl 

; Shift out node number 

Isb 

Rl, sys_lan_bits, Rl 

; Shift it into address position 

stag 

Rl, ADDR, Rl 


0 

o 

• 

Rl, A1 


; First 

(and only) word 


movs 

[3,A3], Rl 


noTs 

[3,A3], R3 


SLOTS 

[R1,A1], RO 

; SsTe to see if anything waiting 

SLOTO 

R3, [R1,A1] 


bs 

RO, ~local_movr_dona 



; Va amst rastart a continuation bacausa RO <> 0. 

local.moTr_naxt_tripla: 

mova RO, A1 

mova HR, Rl 

sand Rl, 0 

sand [0,A1], 0 

sanda [1,A1], 0 

; Wa vould daallocata tba tripla around bora 
mova [3,A1], RO 

bnz RO, *local_moTr_nazt_tripla 


local_moTr_dona: 

suspand 

; Vban a gate is dons, it initiatos a split-pbasa transaction 
; (according to lannucci’s induction). It sands a massags to tba 


dssirsd nods of tbs form: 

<baadsr> [0,A3] 
CB to allocato frama for [1,A3] 
FD to sand rssult nas FD to [3,A3] 
Offsat sitbin FD [3,A3] 


Tba job of local_gatc, aftar allocating spaca, is to notify tba 
callsr and to sat tba frama in motion. For obvious raasons, it 
doas tbs tso subtasks in tbat ordar. 


local.gstc: 

> tat 
DC 

mova 

mova 

call 


up for ALLOCATE call 

IP: (local_gatc_l«sys_lan_bits)+ABSOLUTE 
RO, R3 
[1,A3], Rl 
ALLOCATE 


local_gstc_l: 

; Built up tba FD and sand it back 
DC ■Clocal.movr.msg.raf} 
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stag R3. IIT. R3 

Ish R3, 16-sys_l«n_bits, B3 

HOT* m, R1 

add US, Rl. 113 

stag R3, FD, R3 

sand2 [2,13]. RO. 0 

sand [2,13], 0 

sand2a [3,13], R3, 0 

; Sat up for aathod spacifiad by coda block 
■ora [1,13], R1 

stag Rl. IIT, R1 

Ish Rl, -16, Rl ; Shift off los bits 

Ish Rl, sys_lan_bits, Rl 

add Rl, 2, Rl ; Put in langth bits 

stag Rl, HS6, Rl 

; Tha IDDR is still in R2 
mora HR, RO 

sand RO, 0 

sand2a Rl, R2, 0 

suspand 

; fault_sand_loe is usad to sait, shan sa sand nassagas too fast. 
; This Toutina is liftad, rarbatin, from Haldamar’s MS thasis. 

; It raquiras typa chacking to ba disablad. 
fault_sand_loc: 

mora fip, RO 

rot RO, -9, RO 

sub RO, 1, RO 

rot RO, 9, RO 

mora RO, fip 

mora fopO, RO 

mora fip, ip 

; This axpacts Rl to hold tha offsat from 12. 

; Only Rl, 12, and 13 ara guarantaad. Chacking must ba off. 
lookup.loc: 

mora [Rl,12], RO 

chack RO, CFUT, R2 

bf R2, 'tarribla ; Doubla srita 

bz RO, *lookup_dona 

mora IIR, R3 

lookup_nazt: 

mora RO, 11 

sand R3, 0 

sand [0,11], 0 

; sanda [1,11], 0 

sanda 12, 0 

; Daallocata tripla 
mora [2,11], RO 

bnz RO, "lookup.naxt 

lookup_dona: 

mora fip, ip 

; For CITT, Rl should hold tha offsat of tha tast location, 

; and RO should hold tha massaga nama. It laast for nos, 

; tha continuation sill ba spasnad to tha sama noda. 
i Chacking should ba off (to aroid CFUT faults), 
cntt.loc: 
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moT* [R1,A2], R2 ; Check test location 

cheek R2, CFUT, R3 ; Is it a CFUT? 

bf R3i *entt_send_it ; If not, se can sand 

: Instead, key it on [Rl,12] 
mote RO, IDO ; Sate MSG 

; allocate a triple from stack 
DC addr:FREB_PTR«S 7 t_len_bits 

more RO, A1 
more [0,il], R2 

DC IIT:3«sys.len_bits 

add R2, RO, R3 

more R3, [0,11] 

; R2 holds base of triple 
more R2, 11 

; Fill in triple 
more IDO, RO ; Restore it 
more RO, [0,11] 

more 12, RO 

more RO, [1,11] 

more [Rl,12], RO 

more RO, [2,11] 

; Store pointer to new triple 
stag R2, CFUT, R2 ; <— 

more R2, [Rl,12] 

more fip, ip 

cntt_send_it: 

more IIR, R1 

sand2 Rl, RO, 0 

sanda 12, 0 

more fip, ip 

; Rl holds the number of words requested, 

; Result will be an IDDR in R2. 

; Prasarres 11 through 13. 
calloe.loc: 

DC lDDR:FREE.I>TR«S 7 S_lan_bits 

more 11, R3 

more RO, 11 

more [0,11], R2 

Ish Rl, S 7 s_lan_bits, Rl ; Shift size count into place 
add R2, Rl, Rl 

more Rl, [0,11] 

stag R2, IDDR, R2 

mors R3, 11 

mors fip, ip 

; This expects Rl to hold the offset from 11. 

; Only Rl, 11, 12, and 13 are guaranteed. Checking must be off. 
lookup_iter_loc: 

more [Rl,ll], RO 

; check RO, CFUT, R2 

; bf R2, ‘terrible 

bz RO, *lookup_iter_done 


; Take old pointer 
; Put it at end of triple 
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■OT* 


; Sara it 


11, 113 
moT# K3, 10 

BOT« HR, R3 
looknp_itar_nazt: 

BOT# RO, 11 

sand R3, 0 

sand [0,11], 0 

sanda [1,11], 0 

: Daalloeata txipla 
Bosa [2,11], RO 
bns RO, 'lookup.itar.naxt 

BOTa 10, R3 

Bora R3, 11 

lookup.itar.dona: 

BOTa lip, ip 

; Chack_itax_loe axpacts R1 to hata tka ralua to put into ID [0,11] . 
i It BOTos it tkara and starts tka loop if botk flags ara sat. 

; It satas tka addxass xagistars. 
ckack.itar.loc: 


DC 

IBT:BaskIM + BaakPC 

mors 

Rl, [0,11] 

and 

Rl, RO, Rl 

•q 

Rl, RO, R2 

bt 

R2, *ckack_itar_8tart 

moTS 

fip, ip 

tar_8tart: 

DC 

{loop_B8g_raf> 

mors 

BBR, Rl 

8snd2 

Rl, RO, 0 

sand 

11, 0 

sends 

[slotID,12], 0 

mors 

fip, ip 


and 

fault.tae.addr.pO + fault.cfut = IP: ((LIBRlRY_PLlCE+fault_cfut_loc)«8ys_len_bits) + IBSOLUTE+UBCHECKED 
fault_Tae_addr_pO + fault.sand = IP: ((LIBRlRy_PLlCE+fault_sand_loc)«8r8_lon_bits) + IBSOLUTE+UBCHECKED 
syscall_Tae_addr + ILLOCITE = IP: ((LIBRlRY_PLlCE+allocnta_loc)«8y8_len_bits) + IBSQLUTE 
syscall_Tac_addr + LOOKUP = IP: ((LIBRlRY.PLlCE+lookup_loe)«8ys_lan_bit8) + IBSOLUTE+UBCHECKED 
8yscall_Tac_addr + CBTT = IP: ((LIBRlRY_PLlCE+cntt_loc)«8y8_lan_bits) + IBSOLUTE+UBCHECKED 
sy8call_Tac_addr + CILLOC = IP: ((LIBRlRY_PLlCE+calloc_loc)«8y8_lan_bit8) + IBSOLUTE 
raf local_BOTr_Bsg_raf = MSG: ((LIBRlRY_PLlCE+local_BOTr)«8y8_lan_bit8)+UHCHECKED+4 
raf local_gatc_nsg_raf = MSG: ((LIBRlRY_PLlCE+loeal_gatc)«8y8_lan_bit8)+4 
raf local_fatck_B8g_raf = MSG: ((LIBRlRY_PLlCE+local_fatck)«8y8_lan_bit8)+UlCHECKED+4 
raf local_8tora_ia8g.raf = MSG: ((LIBRlRY_PLlCE+locul_8tora)«8y8_lan_bit8)+UHCHECKED+3 

ayacall.tac.addr + LOOKUP.ITER « IP: ((LIBRlRY_PLlCE+lookup_itar_loe)«8y8_len_bit8) + IBSOLUTE+UBCHECKED 
8y8call_Tac_addr + CHECK.ITKR = IP: ((LIBRlRY_PLlCE+ckack_itar_loc)«sy8_len_bit8) + IBSOLUTE 


inelud« “lotsozots.mdp'* ; CFUTUREs for stack 
FRKE.PTR=IIT: STACK.BASE«sys_l6ii.bits 
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B.2 I-Structure Routines 


; This is s ehangsd Tsrsion of istmctS .mdp that nsos difforont 
; roprossntations: 
i EMPTY - null CFOT 

; VAITIIQ - non-nnll CFUT 

; DATA - non-CFUT 

; It also goss through local_iioTr. 

; Tho forsiat of i-struetura addrossas ara; 

; Tha los 16 bits hold tha noda numbar 

; Tha high 16 bits hold tha addxass on that noda 

; I-structnra addrassas ara typad TAGS, shich sill ba dafinad 
; to ITAG. 

;; This is shat cods to fatch an I-structura call looks lika: 

; With tha pointar (taggad int) in R1 and tha I-stmct off sat in 112, 
; and tha fraaa offsat in R3. 
i_fstch_cods: 


dc 

■Csystaai_fatch_msg_rsf> 

sand20 

Rl, RO 

; Sand nods numbar, hsadar 

sand20 

Rl, R3 

; Sand ISA 

sand20s 

A2, R3 

; Sand frams, offsat 

suspand 




Systam fstch gats: 

[0,A3]: MSG:<syst am-fat ch> 

[1,A3]: IIT:<i-structurs addrass> 

[3,A3]: IIT:<offsat from i-strnctura> 
[3,A3]: FD:<frama of dast> 

[4,A3]: IIT:<offsat from frama> 


VARIIIG: SEISITIVE TO BIT CHARGES: 


Spacifically, assumss SYS.LER.BITS = 10, 
NAI_R0DES > 2-16 


systam_fatch: 


mov« 

[1,A3], Rl 

Put ISA in Rl 

Ish 

Rl, -16, Rl 

Slida oTsr addrass portion to dal noda * 

l8h 

Rl, 10, Rl 

Slida into addrass position 

mOT8 

Rl, A1 


mOT6 

[2,A3], R2 

Put offsot into R2 


R2, [1,A1], R3 

If it’s graatar than uppsr bound... 

bt 

R3, "i.arr 

...it’s an arror. 

moT« 

[0,A1], RO 

Put loBsr bound in RO 

sub 

R2, RO, R2 

Subtract off bass 

It 

R2, 0, R3 

If it’s losar than base... 

bt 

R3, "i.arr 

...than it’s an arror 

add 

R2, 2, R2 

Point past too bounds sords 

aoTS 

[R2,A1], Rl 

Taka itsm in i-structurs spot 

chsck 

Rl, CFUT, RO 


bt 

ROp '^data.not.prassnt 


; If sa gat hara, sa 
ssndO [3,A3] 
ssndO [4, A3] 
ssnd20a [1,A3], RO 
suspsnd 


hasa tha data and can raturn it. 
; Rods numbsr of dsstination 
; MSG haadsr of dostination 
; contort. Talus 


;;; This cass handlss both a first and subsaqusnt stors. 
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;;; It allocates a tripla for a linkod-list. 
data.not.proaont: 


i If «• 

g«t her*, [R2.11]. th« rafaranc*. R1. holds s ofutura. 

; Got triplo 


DC 

IDDR: FREE_PTR«S 7 B_laii_bits 

BOTO 

RO, 12 


DC 

3<<sys_lon_bits 


BOTO 

[0.12]. R3 

Put start location in R2 

BMTO 

R3. 12 

12 nos holds a ptr to a nas tripla 

add 

R3. RO. R3 

Put nazt fraa location in R3... 

BOTO 

R3. [0.12] 

...and than back into fraa ptr 

; Storo 

12 into i-structuro location 

BOTO 

12. RO 


BOTO 

RO. [R2.11] 


; storo 

in tha follosing 

ordor: 

• 

• 

Frama nmabar of dostination 

Offsat w/in frama 

> 

■art ptr 


BOTO 

[3.13]. RO 

Framo niunbor 

BOTO 

RO. [0.12] 


BOTO 

[4.13], RO 

Frsmo offset 

BOTO 

RO, [1.12] 


BOTO 

Rl. [2,12] 

lozt ptr 

suspond 



; SystoB-storo gots: 


i [0.13]: 

HSG:<8y8toB-storo> 

: [1.13]: 

IVT:<i'Structuro 

addross> 

; [2.13]: 

lIT:<offsot> 


J [3.13]: 

<data> 


S7St«iii_stoT*: 



HOT* 

[1,13], Rl 

Put ISl in Rl 

Ish 

Rl, -16, Rl 

Slid* OTor addross portion to dol node 

Ish 

Rl. 10, Rl 

Slido into addross position 

moT« 

Rl. 11 

11 nov holds abs addross of baso 

moT* 

[2,13], R2 

Put offset into R2 


R2. [1,11], R3 

If it’s greater than upper bound... 

bt 

R3, *i_apr 

..ait’s an error. 

BOTO 

[0,11], RO 

Put lover bound in RO 

sub 

R2. RO. R2 

Subtract off base 

It 

R2. 0. R3 

If it’s lover than base... 

bt 

R3, "i.arr 

...then it’s an error 

add 

R2. 2. R3 

Point past tvo bounds vords 

BOTO 

[R2,ll], Rl 

Take itoB in i~structure spot 

chock 

Rl, CFUT, RO 

It had better be a efuture. 

bf 

RO, “i.arr 

If not, it’s a vrite-tvice error. 

BOTO 

[3,13], R3 

Put data value into R3 

BOTO 

R3, [R2.11] 

Store it into i-structure 

DC 

■[local _moTr_msg_paf> 

bz 

Rl, ‘^sonds.dono 



; At this point, R1 holds baso of noxt linkod-list ontry. 
; RO holds tho loeal_BOTr.jn8g.rof. 

sond.loop: 
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B.3 Loop Support 


;; Constants for loops 

label posPzsTions = 0 

labol maskProTions • tOOOOti 

label posCurrent •• 8 

label maskCnzrent = tffOO 

label poslezt ■ 16 

label aasklext fffOOOO 

label posIM > 34 

label masklH • l«posIN 

label posPC > 3E 

label maskPC = l«posPC 

; System calls 
label CHECK.ITEK 5 

; Expects RO to bate the talne (siasklH or swskPC) to be or’d into 
; the ID R1 (+/-1) off from the current iteration. 

; It moTes it there and starts the loop if both flags are set. 

; It saves the address registers. 

: For nos only, ignore sraparound 
check_iter_loc: 

move [3,13], R3 

add R3, Rl, R3 

; This sequence converts a value of k to 0. Trust me. 

: ge R3, [3,13], R3 

i stag R3, IIT, R3 

; nog R3, R3 

; and R3, [3,13], R3 

; sub R2, R3, R3 

; Yo! I can do even better: 

; It R2, [2,12], R3 

; stag R3, IIT, R3 

; neg R3, R3 

; and R2, R3, R2 

; Vhoops, isust also convert -1 to k-1 
; sub R2, [slotK,12], Rl 

: ge Rl, -1, Rl 

: stag Rl, IIT, Rl 

; neg Rl, Rl 

; and Rl, [slot!,12], Rl 

; sub R2, Rl, R2 

: Oops: above converted k-1 to 1, not v.v. 


It 

R2, 

[slot!,12], R3 

wtag 

R3, 

IIT, R3 


R3, 

R3 

and 

R2, 

R3, R2 

It 

R2, 

0, R3 

atag 

R3, 

IIT, R3 

xi«g 

R3, 

R3 

and 

R3, 

[slot!,13], R3 

add 

R2, 

R3, R3 

add 

R2, 

frame.size+l, B 

mov0 

[R2,12], Rl 

or 

Rl, 

RO, Rl 
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Appendix C 

Source Code 


C.l Convert Hybrid to Complex J 


;;; Mod«:Coimozi-Lisp; Paekag«:ID-CONPILER; BasatlO 
;;; hybrid^to^cj convarts hybrid coda to complax J-machina coda* 
;;: Tha naxt stap is to sand it through cj^to^sj to changa it to 
;;; J-machina s-axprassions. 

(in~packaga ’id'Coiq>ilar) 

(dafcoB^ilar'isodula conTart-hybrid*’to*co]iqplax'j id-compilar 
(:input Tnd-instructions coda-block) 

(:function convart-hybrid-to-cj) 

(: output Tnd-instructions coda-block) ; This is a lia 

; (:bafora-function procadura Tila-asm-bafora-daf) 

; (:aftar-function procadura asm-aftar-daf) 

; (:¥rappar-Kacro vnd-fila-assamblar-wrappar) 

; (:options input-fila vnd-output-fila vnd-output-fila-format) 

) 


; J-machina constants 

; Originally» thasa vara numbars. Thay ara nora raadabla as symbols and 
; can ba raplacad by MDPSiin. Tha constants ara naadad to know if thay’ra okay literals. 


(daf const ant 

*»y*-len-bits* 10) 

(dafconstant 

sym-tag ’sym) 

(dafconstant 

sym 0) 

(dafconstant 

int-tag ’int) 

(dafeonstant 

int 1) 

(dafeoxutant 

fd-tag >fd) 

(dafeonstant 

id 9) 

(dafeonstant 

boolaan-tag ’bool) 

(dafeonstant 

bool 2) 

(dafeonstant 

addr-tag ’addr) 

(dafeonstant 

addr 3) 
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; For REFs ud SYMBOLS (?) 

(dsfconstant spocial-tag ’spscial.tag) 

(dsfeonstant spscial.t&g 33) 

(dafeonstant nlloento-Toctor ’alloeato.Tactor) 

(dsfeonstant mllocats.Tsctor 0) 

(dsfeonstant looknp*vsetor ’lookup.Tsetor) 

(dsfeonstant looknp.Tsctor 1) 

(dsfeonstant entt-Tsetor ’entt.Tsetor) 

(dsfeonstant cntt.Tsetor 2) 

(dsfeonstant cntt-loop-Tsetor ’cntt.loop-Tsctor) 

(dsfeonstant entt.loop-Tsetor 3) 

(dsfeonstant calloe-Tsetor ’ealloe.Tsetor) 

(dsfeonstant ealloe.Tsetor 4) 

(dsfeonstant chsek'itsr~Tsetor ’eksek.itsr-Tsctor) 

(dsfeonstant ehsek.itsr-Tsetor 5) 

(dsfeonstant sposIMs 24) 

(dsfeonstant sposPCs 26) 

(dsfeonstant snaskIMs (sxpt 1 sposIH*)) 

(dsfeonstant smaskPC* (sxpt 1 sposPC*)) 

(dsfnn eonTsrt*hybrid~to-cj (cb) 

(Ist* ((cj'lnstructions (conTsrt-kybrid-to-cj-innsr (dataflov-graph-root-sst cb) 

(dataflo¥-graph-gst cb :frains-dsseriptor)))) 
(sstf (dataflov-grapk'root'sst cb) ej*instnictions)) 
cb) 

(dsfnn conTsrt'kybrid-to-cj'innsr (instructions frams-dsse) 

(if (null instructions) 
nil 

(lot'*' ((instruction (ear instructions)) 

(opcode (car instruction)) 

; Got rid of hybrid register references — ouch 

(operands (mapear t’transform-hybrid-register (copy-list (edr instruction)))) 
(suspsnsiTs-eods (autats-suspsnsiTS-opsrands opcode operands)) 

(fn (conTsrt-opeods-to-fn opcode))) 

(if (null fn) 

(my-srror :fatal nil (format nil “lo opcode for function opcode))) 

(append 

* ((hybrid-instruct ion »instruction) ) 

susp snsiTS-cods 

(apply fn fraas-dssc operands) 

(conTsrt-hybrid-to-cj-inner (edr instructions) frams-dsse))))) 

(dsfrar seonTsrsion-list*) 

(dsfun conTsrt-opcods-to-fn (op) 

(edr (assoc op sconTsrsion-lists))) 

;; Very inefficient 

(dsfun transform-hybrid-rsgistsr (op) 

(if (and (listp op) 

(sq (car op) :rsgistsr) 

(nufflbsrp (second op))) 

‘(rtsmporary (:ba8S »(second op))) 
op)) 
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(dafmacro *usp«asiT«p (oparand) 

‘(mambar :suspansiTa ,oparand)) 

A fas hours aith this saction could yiald soma major optimizations, 

;;; not to mantion shat could bo dona with ragistor allocation. 

(dafun smtata-suspansira-oparands (opcoda oparands) 

(lot ((snspansisa-coda (mutata-susponsiTa-oparands-innar oparands))) 

; spacial 

(if (not (aq opcoda :continua-tast)) 

(if suspansiTO-coda 

(cons ’(suspansira-instruction) 

; ramoTO-duplicatas to ansura only ona 
; chock for (:add (isuspansiva X) (rsuspansisa Z) Y) 

(append (ramoTo-duplicatas suspansisa-coda :tast t’aqual) 

>((suspansiva-chack-dona)))))))) 

(dafun mutata-suspensisa-oparands-inner (operands) 

(if (null oparands) 
nil 

(append 

(if (suspansisap (car operands)) 

(progn 

(satf (car oparands) (remoTS rsuspansisa (car operands))) 
‘((suspansisa-oparand ,(car operands))))) 
(mutata-suspansira-oparands-innar (edr operands))))) 

(dafsar aconvarsion-list*) 

(satq aconsarsion-list* nil) 

(dafmacro dafconsorsion (hybrid-name hybrid-syad>ol oparands body) 

(progn 

(satq aconrarsion-list* 

(eons (eons hybrid-symbol hybrid-name) 

•conversion-list*)) 

(lat ((full-op-list (cons ’frama-dasc operands))) 

’(dafun .hybrid-name ,full-op-list 
’frama-dasc 
.body)))) 

(defun frama-basa-offsot (oparand) 

(if (aq (car oparand) tframe) 

(basa-offsat oparand) 

(error :fatal nil "Illegal oparand supplied shan frama-basa value expected."))) 

;; Used by ej-to-sj 

(defun massaga-basa-offsat (operand) 

(if (aq (car oparand) :massage) 

(basa-offsat oparand) 

(error tfatal nil "Illegal oparand supplied shan massaga-basa value axpactad."))) 

(dafun basa-offsat (oparand) 

(if (aq (ear (second operand)) :basa) 

(second (second operand)) 

(error :fatal nil "Illegal operand supplied vhan basa-offsat value axpactad."))) 

(dafun literal-basa-offsat (oparand) 

(if (and (aq (car operand) :literal) 

(aq (car (second operand)) :basa)) 

(second (second operand)) 

(error :fatal nil "Illegal operand supplied shan litaral-basa value axpactad."))) 

(defconversion gate :gat-contaxt (context-slot raturn-slot) 

’((rasarva (:ragistar scratch)) 
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(moT* (:j-r«gistaT 12) (:r«gist*r scrateli)) 

(wtag (:r«gist*r scratch) (:litsral ,int-tBg) (:rsgistar scratch)) 

(Ish (iragistar scratch) (:litaral ,(- 16 *sys-laii-hits*)) (:r«gistar scratch)) 
(rasarrs (:ragistar scratch2)) 

(mota (:j-ragistar IIK) (:ragistar scratch2)) 

(add (sragistar scratch) 

(:ragistar scratch2) 

(:ragistar scratch)) 

(fraa (:ragistar scratch2)) 

(stag (:ragistar scratch) (:litaral ,Id-tag) (:ragistar scratch)) 

(sandO (:litaral D) 

(sandO (:raf local.gatc)) 

(sandO ,contaxt-slat) 

(sandO (:ragistar scratch)) 

(Iraa (:ragistar scratch)) 

(sandaO , (frana-basa-offsat ratum-slot)))) 

i; i Somathing should ba dona to handla falling into a loop 
(dafconrarsion labal :labal (labal-naisa) 

‘((labal ,labal-nana) 

(iBOTa (znassaga (:basa 1)) (;j-ragistar 12)))) 

(dafnn looknp-into (dast) 

(if (aq (car dast) :frama) 

‘((moTa (:litaral ,(frama-basa-offsat dast)) (:j-ragistar Rl)) 

(call (:litaral .lookup-Tactor))))) 

;; For nos, no loops 

(dafconsarsion mosa :mosa (sourca dost) 

(appand (lookup-into dast) 

‘((■nova ,sourca ,dast)))) 

(dafconvarsion mosa-idantitp :ii 0 Ta-idantit 7 (sourca dast) 

(appand (lookup-into dast) 

‘((mosa .sourca ,dast)))) 

(dafconvarsion cntt :continua-tast (chack-slot cont) 

; Consart it from (:litaral (:symbol :Sq-l)) to (:raf :Sq-l) 

<((moTa (:raf ,(sacond (sacond cont))) (:j-ragistar RO)) 

(mosa (:litaral ,(frama-basa-offsat chack-slot)) (:j-ragistar Rl)) 

(call (:litoral ,cntt-Tactor)))) 

(dofconrarsion cntn scontinua (cont) 

‘((sandO (:j-ragistar HR)) 

; Consort it from (:litaral (:symbol :Sq-l)) to (:raf :Sq-l) 

(sandO (:raf ,(sacond (second cont)))) 

(sandoO (:j-ragistar 12)))) 

(dafconvarsion movr :mova-remote (frame-ptr offset value) 

‘((sandO ,frama-ptr) 

(sandO (:raf local_movr)) 

(sandO .frama-ptr) 

(sendO ,offset) 

(sandaO .value))) 

;;; This should sat a flag 
(dafconvarsion tarminata :tarminata () 

‘((suspend))) 

(dafconvarsion la :<- (si s2 d) 

(appand (lookup-into d) 

‘((la ,sl .s2 .d)))) 


109 



(d*fconT«rsion It :< (si s3 d) 

(appsnd (looknp-into d) 

«(<lt .si ,s2 ,d)))) 

(dslcontsrsion gt :> (si s3 d) 

(append (looknp-into d) 

‘((gt ,sl ,s2 ,d)))) 

(dofconTorsion go :>■ (si s2 d) 

(append (looknp-into d) 

‘((go ,sl .s2 .d)))) 

(defcontersion j-eq := (si s2 d) 

(append (looknp-into d) 

‘((eq ,sl ,t2 ,d)))) 

(delconTOTsion j-neq :<> (si s2 d) 

(append (looknp-into d) 

‘((neq ,sl ,s2 ,d)))) 

(defconversion j-neq2 :/= (si s2 d) 

(append (looknp-into d) 

‘((neq ,sl ,s2 ,d)))) 

(defeontersion j-and :and (si s2 d) 

(append (looknp-into d) 

‘((and ,sl ,s2 ,d)))) 

(defconversion j-or :or (si s2 d) 

(append (looknp-into d) 

‘((or ,sl ,s2 ,d)))) 

(defeonversion j-sub :- (si s2 d) 

(append (looknp-into d) 

‘((snb ,sl .s2 ,d)))) 

(defconversion j-add :+ (si s2 d) 

(append (looknp-into d) 

‘((add .si .s2 .d)))) 

(defconversion j-stnl ;* (si s2 d) 

(append (looknp-into d) 

‘((buI .si .s2 .d)))) 

(defconversion j-not :not (s d) 

(append (looknp-into d) 

‘((not .s .d)))) 

(delconversion j-abs :abs (s d) 

(append (looknp-into d) 

‘((reserve (sregister scratchl)) 

(reserve (:register scratch2)} 

(ash .s -31 (:register scratchl)) 

(zor .s (sregister scratchl) (:register scratch!)) 

(snb (:registsr scratch!) (tregister scratchl) .d) 

(free (:register scratchl)) 

(free (:register scratch!))))) 

(defconversion j-maz :maz (a b d) 

‘((reserve (:register scratchl)) 

(append (looknp-into d) 

(reserve (tregister scratch!)) 

(ge .a .b (:register scratchl)) 

(stag (:register scratchl) .int-tag (sregister scratchl)) 


a >= b 
Rl: T 
Rl: 1 


I a < b 
I Rl: F 
I Rl: 0 
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(Mg (:r«gist*r scratch!) (:r«gistar scratch!)) 

(and (;rsgistar scratch!) ,a (iragistsr scratch2)) 

(not (:ragistor scratch!) (trogistor scratch!)) 

(and (:rsgistsr scratch!) ,b (:rsgistsr scratch!)) 

(or (trsgistsr scratch3) (irsgistar scratch!) ,d) 

(fros (:roglstor scratch!)) 

(Iroo (iragistar scratchS))))) 

(dafconvarsion j-nin :min (a b d) 

(appand (lookup-into d) 

‘((rasarra (;ragistar scratch!)) 

(rasarra (:ragistar scratchS)) 

(ga ,a ,b (:ragistar scratch!)) 

(stag (iraglstar scratch!) ,int-tag (:ragistar scratch!)) 
(nag (iragistar scratch!) (rragistar scratch!)) 

(and (:ragistar scratch!) ,b (:ragistar scratch2)) 

(not (irogistar scratch!) (sragistar scratch!)) 

(and (:ragistar scratch!) ,a (sragistar scratch!)) 

(or (:ragistar scratch2) (:ragistar scratch!) ,d) 

(fraa (:ragistar scratch!)) 

(fraa (iragistar scratch2))))) 


R!: -! 

1 

R!: 

0 

R2: a 

1 

R2: 

0 

R!: 0 

1 

R!: 

- 

R!: 0 

1 

R2: 

b 

a 

1 

b 



H 

A 

1 

a < b 

R!: T 

1 

R!: F 

R!: ! 

1 

R!: 0 

R!: -! 

1 

R!; 0 

R2: b 

1 

R2: 0 

R!: 0 

1 

R!: - 

R!: 0 

1 

R2: a 

b 

1 

a 


;; lot nsad 

(dafconrarsion loop-satnp :loop-satnp (labal-nama) 

‘(lat drama-siza (frama-dascriptor-frama-siza Irama-dasc) 

(k-slot (conpiita-slot-offsat t :iiiaziaiim-itarations)) 

(slots-par-itaration (fraina-dascriptor-nazt-aTailabla-itaration-slot Iramo-dasc)) 
(loop-satup-labal (gansyia ’loop-loop))) 

‘((dc (:litaral ,(a frazia-siza (azpt 2 asys-lan-bitsa)))) 

(moTO (:j-ragistar 12) (:j-ragistar R2)) 

(stag (:j-ragistar K2) ,int-tag (:j-ragistar R2)) 

(add (:j-ragistar R2) (:j-ragistar RO) (:j-ragistar R2)) 

(stag (ij-ragistar R2) ,addr-tag (:j-ragistar R2)) 

(mosa (:j-ragistar R2) (:j-ragistar 1!)) 

(nosa (iframa (:basa ,k-slot)) (:j-ragistar R!)) 

(sub (:j-ragistar R!) (:litaral !) (:j-rsgistsr R!)) 

(mosa ,(+ 2 frama-siza) (:j-ragistar R2)) 

(add (:j-ragi8tar R2) (tfrana (:basa ,k-slot)) (:j-ragistar R2)) 

(dc (ilitaral amaskIMa)) 

(or (:j-ragistar R2) (:j-ragistar RO) (:j-ragistar R2)) 

(mosa (tlitaral !) (:j-ragistar R3)) 

(labal ,loop-satup-labal) 

(mora (:j-ragistar R2) (tframa (:loop 1!))) 

(It (:j-ragistar R3) (jj-ragistar R!) (:j-ragistar RO)) 

(add (:j-ragistar R2) (:litaral .frama-n-itarations) (:j-ragistar R2)) 

(add (:j-ragistar R3) ! (:j-ragistar R3)) 

(bt (:j-ragistar RO) ,loop-satup-labal) 

(dc ,(lognot (logior amaskIH* amaskPCa))) 

(and (:j-ragistar R2) (:j-ragistar RO) (:j-ragistar R2)) 

(mosa (:j-ragistar R2) (sframa (:loop (:j-ragistar R3)))) 

(mora (:j-ragi8tar R2) (fraiaa (:loop 0))) 

(iBosa (:frams (:loop !)) (:j-rsgistsr R2)) 

(and (:j-ragi8tar R2) (:j-ragistar RO) (:j-rsgistsr R2)) 

(add (:j-ragi8tsr R3) ! (:j-rsgistsr R3)) 

(moTS (:j-ragistsr R2) (:frams (:loop (:j-rsgister R3))))))) 

;; from (:litsral (rsymbol :S(]-!)) to (:labsl :SQ-!) 


Ill 








(dafim conrart-labal (1) 

‘ (itaggad-litaral ,sp«ci«l-tag (:lab«l .(sacoiid (sacond 1)})}} 

(dalconvazaion brl :brancb-f alsa (si b2) 

*((bf ,sl ,(eonrart-labal s3)))) 

(dalconTarsion brt :brancb-tnia (si s2) 

‘((bt ,sl ,(conTart-labsl s2)))) 

(dafconTsrsion brs :brancb-ssro (si s2) 

‘((bz ,sl ,(conrart-labal s2)))) 

(dalconTarsion brns :brancb-iiot-saro (si s2) 

*((bns ,sl , (conTart-labal s2)))) 

(dafconsarsion br ibrancb (si) 

‘((br ,(convart-labal si)))) 

(dafconTarsion ixcc :indax-currant-coatart (frama-basa dast) 

(appand (lookup-into dast) 

‘((rasarva (tragistar scratch)) 

(mora (:j-zagistar 12) (:zegistar scratch)) 

(stag (:rsgistsr scratch) (:litaral ,int-tag) (:ragistar scratch)) 

(add (:ragistar scratch) 

(:litaral ,(* (literal-basa-oflsat Irama-basa) 

(axpt 2 asys-lan-bitsa))) 

(:ragistar scratch)) 

(Ish (rregistar scratch) (:litaral ,(- 16 asys-lan-bitsa)) (iragister scratch)) 
(add (:registar scratch) (:j-ragistar HR) (:ragistar scratch)) 

(stag (iragistar scratch) (:litaral ,fd-tag) (:ragiBtar scratch)) 

(mora (:ragistar scratch) ,dast) 

(fraa (:ragistar scratch))))) 

;; Thasa ara okay bacausa tha operands sill be suspensive 

;; and caught by mutata-suspansiva-operand. 

(dafeonvarsion tst2 :tast-2 (si s2 dast) 

(appand (lookup-into dast) 

‘((move (:tagged-literal .boolean-tag 1) .dast)))) 

(dafeonvarsion tstl :test-l (si dost) 

(appand (lookup-into dast) 

‘((move (:taggad-litaral .boolean-tag 1) .dast)))) 

(dafeonvarsion ststl :spacial-tast-l (si) 

<((suspensive-instruction) 

(suspansiva-oparand .si))) 

(dafeonvarsion rate :ratum-contaxt (source dast) 

(appand (lookup-into dast) 

‘((move (itaggad-litaral .boolean-tag 1) .dast)))) 
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C.2 Convert Complex J to Simple J 


;;; NodcConDon-Lisp; Package:ID-C01IPILES; BasarlO 

• cj-to-sj eoiiTarta eoi^lax J-machina coda (as produced by bybrid-to-cj) 

;;; into J-aaebina s-axprassions. Tba s-axprassions vill correspond on an 
;;; exact ona-to-ona basis sitb J-maebina instructions. Tba final step is 
;;; to sand it tbrou^ sj-to-j, in tba file of tbat name. 

;;; Complex J-naebina coda differs from J-maebina coda in saTaral says: 

» t t 

;;i X it tba beginning of asary possibly suspansisa instruction, 

;;; (isuspansiTa-instmction) 

;;; appears. For aacb possibly suspansisa operand, 

;;; (;saspansiTa-oparand <oparand>) 

;;; Tbasa mast be conTartad to appropriate coda. 

« * * 

;;; Z (:rasarTa <symbol» and (:fraa <symbol>} are used to bind tba value 
;;; of tba symbol so tbat (:registar <symbol» is meaningful. Tba 
;;; usage is of tba form: 

;;; (:rasarTa (:ragistar scratch)) 

;;; (:mosa (:j-registar i2) (:register scratch)) 

;;; (:fraa (:register scratch)) 

;;; Tba usage is purposely sarbosa, to alios a change of representation, 

;;; as sell as error-checking. (Rasarving a second register of the same 
;;; name, using a nonrasarsad register, and freeing a nonrasarvad register 
;;; are all errors.) 

t * * 

X Specific register names are denoted sitb :j-ragistar, i.a. (:j-ragister ’i2). 

;;; The only time specific GPRs are used is to set up for CRLLs. This is 

;;; almost certainly a violation of abstraction. This is a source of potential 

;;; bugs as sell if this module trashes those registers. 

I I * 

;;; X lo consideration is made sbatbar tba operation can fit in one J-instruction. 

;;; In many cases, it cannot. For example, this is a legal cj instruction: 

;;; (:add (:frame (:base 6)) 

;;; (:literal 82932) 

;;; (:frama (:base 9))) 

> > > 

;;; X There are both :litaral and :tagged-literal operands. 

;;; The register allocation is correct and stable, to tba bast of my knosledga. 

;;; It is non-optimal but acceptable. 

(in-package ’id-conpiler) 

;;; For some reason tbat I can’t figure out, I’m having trouble getting neq. 

(dafmacro neq (a b) 

‘(not (eq ,a ,b))) 

(dafconpilor-modula convart-eomplex-j-to-siipla-j id-compiler 
(:input vnd-instructions code-block) ; i lie 

(:before-function procedure reset-cj-to-sj-systam) 

(:function convert-cj-to-sj) 

(:output vnd-instructions code-block)) ; Yuck! I’ve got to fix these abstractions 

(defun reset-cj-to-sj-system () 

(setq *j-instructions* nil) 

(setq evirtual-registars* nil) 
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(satq •lr««-r«gist«T-list* ganl-purposa-rcgs)) 


(d«fun By-«rror (a b c) 

(print c) 

(braak) 

(arror a b e)) 

;;; Tbasa ara fnnctiona to spacif; basic J-macbina cbaxactaristics. 

(dafnn awka-taggad-litaral (1) 

(eond ((nnaibarp 1) * (:taggad-litaral ,int-tag ,1)) 

((rafaraneap 1) ‘(:taggad-litaral ,spacial-tag ,1)) 

((aq (ear 1) :taggad-litaral) 1} 

; Conaarts irom (:labal (:litaral (isymbol :foobar))) to 
; (itaggad-litaral spacial-tag (:labal (:s;mbol :foobar))) 

((aq (car 1) :labal) 

(list ;taggad-litaral special-tag (list :labsl 1))) 

; (list :taggad-litaral spacial-tag (list :labal (second (second (second 1)))))) 

((aq (car 1) :litaral) 

(if (listp (second 1>) 

(if (aq (car (second D) :intagar) 

<(:taggad-litaral ,int-tag .(second (second 1))) 

(list :tagged-literal special-tag (second 1))) 

(list :tagged-literal int-tag 
(if (listp (second 1)) 

(if (aq (car (second 1)) :integer) 

(second (second D) 

(nj-error :fatal nil "Illegal format of literal")) 

(second 1))))) 

(t nil))) 


;; Only converts if appropriate 

(dafnn maka-tagged-litaral-if-appropriata (1) 

(let ((result (make-tagged-literal 1))) 

(if result 
result 
1 ))) 

(defun baz-aalua (b) 

(cond ((and (>> h tVO) «= h t\9)) (- k t\0)) 

((and (>= h #\i) «= b #\F)) (♦ 10 (- b #\1))) 

((and (>= b #\a) «» b #\f)) (+ 10 (- b iVa))))) 

(dafmacro bex-to-dec (b-string) 

(do ((count (- (length b-string) 1) (- count 1)) 

(value 0 (+ (• value 10) 

(baz-valua (cbar b-string count))))) 

((< count 0) 
value))) 

(dafeonstant opO-literals (list 

(cons sym-tag 0) ; nil 

(cons boolean-tag 0) ; falsa 

(cons boolean-tag 1) ; true 

(cons int-tag (baz-to-dec "80000000")) 
(cons int-tag (bez-to-dec "ff")) 

(cons int-tag (bex-to-dec "3ff")) 

(cons int-tag (bez-to-dec "ffff")) 

(cons int-tag (bez-to-dec "fffff")))) 

(defun opO-literal-p (1) 

(opO-literal-p-inner 1 nil)) 
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(dcfnn opO-«xt«nd«d-litaral-p (1) 

(opO-litaral-p-ianar It)) 

(dafim taggad-litaral-p (op) 

(aq (cmr op) itaggod-litaxal)) 

(dalnn j-ragistar-p (op) 

(or (aq (car op) :j-ragistar) 

(aq (car (traaslata-oparand op)) :j-ragistar))) 

;; To dist ingnish it Iron lUcc Iramos. 

(dafan j-franap (op) 

(aq (car op) ;fraaio)) 

(dalan j-olfsat-p (op) 

(or (j-framap op) 

(j-massagap op) 

(j-taBq>orarj op))) 

(dafim j-tai^orarj (op) 

(aq (car op) itaa^orary)) 

(dafnn j-nassagap (op) 

(aq (car op) tmassaga)) 

(dafus labalp (op) 

(aq (car op) slabal)) 

(dafun j-symbolp () 

(aq (car op) :symbol)) 

(dafun rafarancap (op) 

(aq (car op) :raf)) 

(dafun bindingp (op) 

(aq (car op) ibinding)) 

;(print (output-taggad-litaral (maka-tagged-litaral ’(ilitaral int)))) 

(dafun opO-litaral-p-innar (1 aztandadp) 

(if (taggad-litaral-p 1) 

(lat* ((tag (sacond 1)) 

(▼alua (if (aq tag int-tag) 

(aral (third 1)) ; To alloa us to usa symbols instead of ints 

(third 1)))) 

(cond ((numbarp talua) 

(if (member (cons tag ralua) opO-litarals :ta8t t’equal) 
t 

(if aztandadp 

(and (aq tag int-tag) 

(>= Talua -64) 

(<= Talua 63)) 

(and (aq tag int-tag) 

(>= Talua -16) 

(<= Talua 18))))) 

((labalp Talua) nil) ; Safe assumption 

(t nil))))) 

(dafun opO-oparand-p (op) 

(opO-oparand-p-innar op nil)) 

(dafun opO-aztanded-oparand-p (op) 

(opO-oparand-p-innar op t)) 
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(daflui opO-op«ruid-p-inn«r (op oxtondodp) 

(eond ((j-rogistor-p op) 

(lot* ((actual (translata-oparand op)) 

(Talu* (soeond actual))) 

(or (gonl-purposo-rog-p op) 

(*q Talu* ’iO) (*q ralu* ’il) (*q Talu* >12) (*q Talu* ’A3)))) 

((tagg*d-lit*ral-p op) 

(opO-litoral-p-innor op oxtondodp)) 

((j-olf**t-p op) 

(lot ((offaot (baso-offaot op))) 

(cond ((nuaborp offaot) 

(if oxtondodp 

(and « offaot 63) (>= offaot 0)) 

(and (< offaot 16) (>= offaot 0)))) 

((gonl-purpoao-rog-p offaot) 
t)))))) 

;; Kogiator-oxiontod opO mod* 

(dofun ropO-op*rand-p (op) 

(lot* ((oporand (tranalato-oporand op)) 

(Talu* (aoeond oporand))) 

(and (j-r*giat*r-p oporand) 

(or (g*nl-purpox*-r*g-p op) 

(mombor Talu* *(A0 A1 A2 A3 HR IP)))))) ; Hor* oxiat, but tboso only onoa used 

(dofconatant gonl-purpoao-roga ’(R3 R2 R1 RO)) 

(dofun g*nl-purpoa*-r*g-p (oporand) 

(lot ((op (tranalato-oporand oporand))) 

(or (bindingp op) 

(and (j-rogiator-p op) 

(mombor (aoeond op) gonl-purpoao-roga))))) 

(dofun baaic-add (argl Aroat arga) 

(+ (if argl 1 0) 

(count t arga))) 

;;; Currant rogiator acbom* duo in part to lat*. 

;;; Thia ayatom ia atill primitiT*. Som* notabl* omiaaiona: 

;;; - It might roload a rogiator with a Talu* alroady in it. 

(dofTar *fr**-r*giat*r-liat*) 

(aotq *fr**-ragixt*r-liat* gonl-purpoao-roga) 

(dofTar aymbola-bound-to-r*ga) 

(aotq aymbola-bound-to-r*ga nil) 

(dofun roquoat-rogiator-innor () 

(if (null *fr**-r*giat*r-liat*) 

(my-*rror :fatal nil "lo rogiatora UTailabl* in roquoat-rogiator-innor") 

(lot* ((toiqp (romoT* ’RO *fr**-r*gi8t*r-liat*)) 

(rog (if (null tomp) ’RO (car tomp)))) 

(aotq ofroo-rogiator-liat* (romoT* rog ofroo-rogiator-liat*)) 
rog))) 

(dofun roquoat-appropriato-rogiator (itom) 

(if (g*nl-purpoa*-r*g-p itom) 

(my-*rror :fatal nil "Rog-rog moT* roquoatod!")) 

(if (and (tagg*d-lit*ral-p itom) 

(not (opO-*xt*nd*d-lit*ral-p itom))) 

(if (mombor ’RO ofroo-rogiator-liat*) 

(progn (aotf ofroo-rogiator-liat* (romoT* ’RO ofroo-rogiator-liat*)) 

(aotf aymbola-bound-to-r*ga (cona (cona (gonaym ’rog) ’RO) aymbola-bound-to-rogs)) 
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‘(ibinding ,(cKar ■ymbols-bomid-ta-rags)}) 

;; If a« g«t b«r«, a« xi««d to slid* KO into nnothor registor 
(lot* ((rO-paix (rnasoc ’RO S 7 isbols-bound-to-rogs)) 

(nos-rog (roquost-rogistar-innor)) ; Got another registor 
(enr-naae (gensjm))} ; laiae to return with nos register 

; Emit the move — to a global?? 

(if (null rO-pair) 

(my-error :fatal nil "RO invariant violated")) 

(emit-j-instruetion ‘(move (:j-registar RO) (:j-register ,noB-reg))) 

(setf (edr rO-pair) nes-reg) 

(setq symbols-bound-to-regs 

(cons (eons cur-name ’RO) 

symbols-botmd-to-regs) ) 

‘(:binding ,cur-name))) 

(raquast-any-ragistar))) 

(defun raquest-any-register () 

(lot ((reg (roquest-registar-inner))) 

(setq symbols-bound-to-regs (cons (cons (gansym ’reg) reg) symbols-bound-to-regs)) 
‘ (;binding ,(caar symbols-bonnd-to-rags)))) 

(defun return-register (reg) 

(if (eq (car rag) :binding) 

(let ((pair (assoc (second reg) symbols-bound-to-regs))) 

(if (null pair) 

(my-error :fatal nil "Illegal binding freed in return-register") 

(lot ((actual (edr pair))) 

(setf efroe-register-list* (cons actual *free-registor-list*)) 

(setq symbols-bound-to-regs (remove pair symbols-bound-to-regs))))) 
(my-error :fatal nil "Illegal register return"))) 

(defun binding-to-registar (symbol) 

(if (and (listp symbol) 

(eq (car symbol) :binding)) 

(let ((pair (assoc (second symbol) symbols-bound-to-regs))) 

(if (null pair) 

(my-error :fatal nil "Binding not found") 

•(:j-register ,(cdr pair)))))) 


;;; Emit commands 

;; This "forces" register assigments uhen the code is emitted. 

(dafun translate-operand (op) 

(if (listp op) 

(cond ((null op) nil) 

((eq (car op) :binding) (binding-to-ragistar op)) 

((eq (car op) :registar) (translate-virtual-ragister op)) 
(t (cons (translate-operand (car op)) 

(translate-operand (edr op))))) 

op)) 

(dofvar *j-instruetions* nil) 

(setq ej-instruetions* nil) 

(defun emit-j-instruction (inst they (pass-through nil)) 

(let* ((opcode (car inst)) 

(operands (if pass-through 
(edr inst) 

(mapear t’translate-operand (edr inst)))) 
(instruction (cons opcode operands))) 

(setq ej-instructions* (append *j-instructions* 

(list instruction))) 

; For trace purposes, just return latest nes instruction 
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instruction)) 


(dofun smit-j-instmctions (ilist) 
(mapear t’amit-j-instruction ilist)) 


;;; NoTsmant routines 

(dafun Maka-lagal-moTa (source dast) 

(if (or (ganl-purposa-rag-p source) 

(ganl-purposa-rag-p dast)) 

; It least one is a register 
(make-legal-iioTe-sith-register source dast) 

(let ((register (request-appropriata-register source))) 

(laaka-legal-DOTe source register) 

(■aaka-lagal-BioTe register dast) 

(return-register register)))) 

;;; Possible operands include: 

(:register ..) 

;;; (:j-register ..) 

;;; (:binding ..) 

;;; (:frame (:basa <)) 

;;; (:tagged-litaral t •) 

(defun make-legal-moTe-sith-ragistar (source dast) 

(if (genl-purpose-reg-p source) 

(if (or (ropO-oparand-p dest) 

(opO-extended-oparand-p dast)) 

(emit-j-instruction ‘(move .source .dast)) 

;; If se gat bare, source is a register, but dest is too big 
(make-lagal-big-BOTe source dest)) 

(if (genl-purpose-reg-p dest) 

(if (or (ropO-operand-p source) 

(opO-axtanded-operand-p source)) 

(amit-j-instruction ‘(move .source .dest)) 

(make-lagal-big-mota source dast))))) 

;;; make-lagal-big-mova called eban one operand is a register and tba 

;;; other is somatbing that can’t be represented in opO or register-oriented 

;;; opO mode, sucb as a big literal or a frame talue sitb a large offset. 

(dafun make-legal-big-mota (source dast) 

(if (genl-purpose-reg-p source) 

;; destination must be frame (or aquit.) (i.e. can’t be literal) 

(let* ((offset (base-offset dest)) 

(tagged-offset (maka-tagged-literal offset)) 

(rag (raquast-appropriata-ragistar tagged-offset)) 

(nav-operand (replace-offset rag dest))) 

(maka-lagal-moTa tagged-offset rag) 

(make-lagal-mota source nas-oparand) 

(return-register reg)) 

;; If se gat bare, dast must be a gpr 
(cond ((tagged-litaral-p source) 

(if (opO-literal-p source) 

(amit-j-instruction (list ’move source dest)) 

(let ((actual-dast (translate-operand dast))) 

(if (equal actual-dast ’(:j-ragistar RO)) 

(emit-j-instruction ‘(dc .source)) 

(message :fatal nil "RO not reserved sban required"))))) 
(t (my-error :fatal nil "Unbandled case in maka-lagal-big-move"))))) 


(defstruct code-bundle 
operand-list 
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r*gs-to-b*-fr*«d) 

(d«fun bundl«-r«tuni-r*gi«t«rB (bundle) 

(mapeun t’return-register (code-bundle-regs-to-be-freed bundle)) 

(setf (code-bnndle-regs-te-be-freed bundle) nil) 
bundle) 

(defun convert-cj-to-sj (cb) 

(mape t’make-legal (dataflos-graph-root-set cb)) 

(setf (dataflos-grapb-root-set cb) *j-instructions*) 
cb) 

(defun make-legal (instruction) 

(let* ((opcode (car instruction)) 

(operands (if (eq opcode ’hybrid-instruction) 

(edr instruction) 

(mapear t’make-tagged-literal-if-appropriate (edr instruction)))) 
(num-ops (length operands)) 

(instruction (cons opcode operands))) 

(if (pseudo-op-p opcode) 

(process-pseudo-op opcode operands) 

(cond ((= num-ops 0) ; Typically, suspend 

(emit-j-instruction instruction)) 

((• num-ops 1) ; Typically, send or branch 

(if (eq opcode ’br) 

(make-branch opcode operands) 

(make-into-form opcode 

operands 

(cons t’ext-opO ’source)))) 

;; Shouldn’t something for branches be here? 

((s num-ops 2) ; Typically move, unary op, or bcc 

(cond ((equal opcode ’move) 

(siake-legal-move (first operands) (second operands))) 

((or (equal opcode ’neg) (equal opcode ’not) (equal opcode ’rtag)) 
(make-into-form opcode 

operands 

(cons t’ext-opO ’source) 

(cons i’gpr ’dost))) 

((member opcode ’(bf bt bs bns)) 

(make-branch opcode operands)) 

(t 

(message :fatal nil "Illegal opcode in make-legal")))) 

((= num-ops 3) ; Typically binary op (all have same format) 

;; It should try exchanging the first tuo operands to execute more cheaply 
(make-into-form opcode 

operands 

(cons t’gpr ’source) 

(cons ’opO ’source) 

(cons ’gpr ’dest))))))) 

;; Some conditional branches can’t be encoded into one instruction; additionally, in 
:; my simple one-pass assembler, I can’t determine displacements, etc. Hence, all 
•• jumps will bs converted in a pessimistic say, e.g. 

;; bz R1, labell 

;; bnz R1, neu.label 

;; br labell 

;; nes.label: 

;; The types of branches are: bf, bt, bz, bnz, bnil, bnnil. 

;; (The last tuo aren’t used by hybrid stuff but are in for completeness.) 

(defvar branch-opposites ’((bf . bt) (bz . bnz) (bnnil . bnil) 

(bt . bf) (bnz . bz) (bnil . bnnil))) 

(defun make-branch (opcode operands) 
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(if (aq opcode ’br) ; Ibsolnto branch 

(mako-logal ‘(moTO ,(car operands) (:j-ragister ip})) 

(let (;(nee-label (gensym ’jog)) 

(opposite-opcode (edr (assoc opcode branch-opposites})) 

(condition (first operands)) 

(original-label (second operands))) 

; The folloeing line would give ns an infinite loop! 

; (mahe-legal *(,opposite-opeode .condition (:label ,nee-label))) 

; Instead, do a eiolation of abstraction: 

(make-into-fona opposite-opcode ‘(.condition (:tagged-literal ,int-tag 2)) 

(cons i’gpr ’sonree) (cons f’ezt-opO ’sonrea)) 

(aiake-legal * (br , original-label) ) 

(aiahe-legal ‘(align)) 

; (make-legal ‘(label (:literal (tsymbol ,nee-label)))) 

))) 

(defun replace-offset (rag operand) 

(list (car operand) 

(list 

(caadr operand) 
reg))) 

(dafun gpr (arg dir bundle) 

(if (ganl-purpose-reg-p arg) 

(make-code-bnndla 

:operand-list (append (coda-bundla-operand-list bundle) (list arg)) 

:rags-to-be-fraad (code-bundla-rags-to-ba-freed bundle)) 

(make-moee-eith-ragister arg dir bundle))) 

(dafun maka-moea-eith-registar (arg dir bundle) 

(let ((rag (request-appropriata-ragistar arg))) 

(if (eq dir ’source) 

(progn 

(maka-legal-moea arg reg) 

(maka-code-bundla 

:operand-list (append (code-bundle-operand-list bundle) (list rag)) 

:rags-to-ba-freed (append (code-bundle-regs-to-be-freed bundle) (list reg)))) 

:: dest 
(progn 

(make-lagal-moee reg arg) 

(make-coda-bundle 

:operand-list (append (code-bundle-operand-list bundle) (list reg)) 

:regs-to-be-freed (append (coda-bundla-rags-to-ba-fraed bundle) (list reg))))))) 

(dafmacro basa-tagged-offsat (a) 

‘(make-tagged-literal (base-offset ,a))) 

(defun opO (arg dir bundle) 

(if (opO-operand-p arg) 

(laaka-code-bundle 

:operand-list (append (coda-bundle-operand-list bundle) (list arg)) 

:rags-to-ba-freed (code-bundla-regs-to-ba-fraed bundle)} 

(maka-big-itam-into-gpr arg dir bundle))) 

(dafun laaka-big-itam-into-gpr (arg dir bundle) 

;; There are two possibilities: 

;; (1) it is a frame reference that ee could convert (in which case direction is irrelevant) 

(if (eq (car arg) :frame) 

(let* ((value (base-tagged-offset arg)) 

(rag (request-appropriate-ragister value))) 

(maka-legal-mova value reg) 

(laaka-coda-bundle 

:operand-list (append (code-bundle-operand-list bundle) (list (replace-offset reg arg))) 
:rags-to-be-fraed (append (code-bundle-regs-to-be-freed bundle) (list reg)))) 
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;; (2) it mst b* ctortd into a ■•puat* xagistar 
(maka-BOTa-vitb-ragiatar arg dir bandla))) 

(dafun axt-opO (arg dir bandla) 

(if (opO-artandad-oparand-p arg) 

(maka-coda-bondla 

:oparand-list (appand (coda-bandla-oparand-liat bnndla) (list arg)) 

:ragB-to-ba-fraad (coda-bundla-rags-to-ba-lraad bnndla)) 

(maka-big-itam-into-gpr arg dir bnndla))) 

(dafnn gnarantaad-ok (arg dir bnndla) 
bnndla) 

(dafnn procasa-oparand-if-aonrca (oparands pattama connt bnndla) 

(if (>s connt (langth oparanda)) 
bnndla 

(lat ((op (nth count oparanda)) 

(pat (nth connt pattama))) 

(if (aq (cdr pat) ’aourca) 

(apply (car pat) (liat op ’aonrca bnndla)) 
bnndla)))) 

(dafnn 87 iiibol> (x y) 

(atring> (atring x) (atring y))) 

(dafnn procaaa-oparand-if-daat (op pat bnndla) 

;; Procaaa only if daatination 
(if (naq (cdr pat) ’daat) 
nil 
(liat 

(if (ganl-purpoaa-rag-p op) 
op 

(lat ((rag (raquaat-any-ragiatar))) 

(aatf (coda-bundla-raga-to-ba-fraad bnndla) 

(eona rag (coda-bundla-raga-to-ba-fraad bnndla))) 
rag))))) 

;; Unfortunataly, it aaaiaa na hata to coda in aoma apacif ica to kaap 
;; tha coda from baing too conplax. Tha aaauniptiona ara: 

* * 

;; - in inatruction haa up to tso aourcas. 

* * 

;; - Tha laat oparand ia tha only ona that can ba a daatination. 

;; If it ia a daatination, it ia alao a gpr (axcapt for motaa, 

;; which arc handled apacially). 

(dafnn naka-into-form (opcode oparanda treat pattern) 
i Firat, check that aana t of oparanda aa pattama 
(if (/= (langth oparanda) 

(langth pattern)) 

(my-arror :fatal nil "lot enough oparanda for pattern'*) 

; Ganarata tha coda for up to two aonrcaa and np to ona dost 

(lat* ((atap-one (procaaa-operand-if-aourca oparanda pattern 0 (mako-coda-bundla))) 

(bundle (procaaa-oparand-if-aonrea oparanda pattern 1 atap-one)) 

(deat-reg (proceaa-operand-if-deat (car (laat operands)) 

(car (laat pattern)) 

bundle))) ;! Bundle mutated ! 

; Emit instruction 

(emit-j-instmetion (cons opcode (appand (code-bnndla-oparand-list bundle) 

daat-rag))) 

; Emit tha coda (if any) to put result into destination 
(if (and daat-rag 

(not (equal daat-rag (last operands)))) ; aq and aql too strong for lists 

(maka-lagal-moTa (car daat-rag) (car (last operands)))) 
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; Fr«« registers 

(napesr •’return-register (code-bundle-regs-to-be-freed bundle))))) 


;;; Pseudo-op functions, for rreserte and :free, :suspensiTe*, and :label 

;;; - (zreserre <s 7 mbol>) and (:frse <symbol>) are used to bind the Talue 
of the symbol so that (:register <symbol» is meaningful. The 
;;; usage is of the form: 

;;; (:reserTe (:register scratch)) 

;;; (:moTe (:j-registsr 13) (:register scratch)) 

;;; (:free (:registsr scratch)) 

;;; The usage is purposely verbose, to allov a change of representation, 

;;; as sell as error-checking. (Reserving a second register of the same 
;;; name, using a nonreserved register, and freeing a nonreserved register 
;;; are all errors.) 

(defvar evirtual-registers*) 

(defvar *pseudo-op-list*) 

(defnn pseudo-op-p (op) 

(assoc op *pseudo-op-list*)) 

(defun process-pseudo-op (opcode operands) 

(apply (edr (assoc opcode *pseudo-op-list*)) (list operands))) 

(defun reserve-virtual-register (operands) 

(let* ((operand (first (car operands))) 

(name (second (car operands)))) 

(if (neq operand :register) 

(my-error :fatal nil "Illegal ireserve syntax") 

; Check if it’s already allocated 
(if (assoc name *virtual-registers*) 

(my-error ifatal nil "In attempt vas loade to re-allocate a virtual register") 

(let ((reg (request-any-register))) ; This is a TEHPORIRY measure — it might need RO 
(setq *virtual-registers* 

(cons (cons name reg) 

*virtual-registers*)))))) 

nil) 

(defun free-virtual-register (operands) 

(let ((operand (first (car operands))) 

(name (second (car operands)))) 

(if (neq operand :register) 

(my-error :fatal nil "Illegal :free syntax") 

; <3ieck if it’s already allocated 
(if (assoc name *virtual-registers*) 

(progn 

(return-register (edr (assoc name *virtual-registers*))) 

(setq *virtual-registers* 

(remove (assoc name evirtual-registers*) 
evirtual-registers*))) 

(my-error ;fatal nil "in attempt sas aiade to free an unallocated virtual register")))) 

nil) 

;; Thera are tso things this could be called for: 

;; (:registor <namo>) 

■ i ox 

;j (.-register 9) 

;; The meanings are very different. The first was a temporary assigned by (my) hybrid-to-cj. 

;; The latter sas a temporary assigned by lannucci’s generate-vnd-instruction. Both map to 
;; the same thing hosever. For nos, use [9,10] for the latter. Inefficient, but correct. 
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;; I implamant this in hybrid-to-cj but dsscribs it hsrs, bscauts tbs rsal fix should bo hors. 
I; i solution I eonsidsrsd but uhieh is lOT implsmsntsd: 

:; Boeauss rsssrrs k frss ars smittsd for ths first typo and sonittod* 

;; for ths sseend, us haus to assums a littls: A first accsss is an implicit rsssrvs, and a 
;; sscond is an u^licit frss. This matchss hos lannucci usss rsgistsrs (I think!) for 
;; non-loop-sstup. 

(dsfun translats-Tirtnal-rsgistsr (rsf) 

(1st ((nams (sscond rsf))) 

(translats-opsrand (edr (assoc nams suirtual-rsgistsrs*))))) 

;; Originating J-machins cods hsrs might bs somsthing of a uiolation of abstraction. 

(dsfuar suspsnsirs-binding) 

(dsfun awks-suspsnsits-instruction-ceds (dummy) 

(Ists ((nams (gsnsym ’suspsnsius)) 

(labsl *(:litsral (:symbol ,nams))) 

(rsf (mahs-taggsd-litsral ‘(:rsf .nams)))) 

(maks-lsgal ‘(labsl ,labsl)) 

(sstq suspsnsivs-binding (rsqusst-appropriats-rsgistsr rsf)) ; RO 
(maks-lsgal ‘(mots (;msssags (:bass D) (:j-rsgistsr 12))) 

(maks-lsgal-moTS rsf suspsnsits-binding))) 

; (align) 

; (maks-lsgal ‘(movs (:j-rsgistsr ip) (imsssags (:bass 0)))) 

; (maks-lsgal ‘(mots (:msssags (:bass D) (:j-rsgistsr 12)))) 

This is insfficisnt. 

(dsfun maks-prsssncs-chsck (opsrands) 

(1st ((op (car operands)) 

(rsg (rsqusst-any-rsgistsr))) 

(maks-lsgal ‘(rtag ,op ,rsg)) 

(rstum-rsgistsr rsg))) 

(dsfun snd-suspsnsits-part (dummy) 

(rstum-rsgistsr suspsns ivs-binding) 

(sstq suspsnsits-binding nil)) 

(dsfun handls-labsl (opsrands) 

(smit-j-instmetion (list ’labsl (car operands)))) 

(defun pass-throngh-hybrid-instruction (opsrands) 

(smit-j-instruction ‘(hybrid-instruction .Ooperands) :pass-through t)) 

(sstq spssudo-op-list* (list (cons ’rsssrts •’rsssrts-Tirtual-rsgister) 

(cons ’free t’frss-Tirtual-rsgistsr) 

(cons ’suspsnsits-instruction t’maks-suspsnsits-instruction-code) 
(cons ’suspsnsits-operand >’maks-prsssncs-chsck) 

(cons ’suspsnsits-chsek-dons t’snd-suspensiTS-part) 

(cons ’label *’handls-labsl) 

(cons ’hybrid-instruction •’pass-through-hybrid-instruotion))) 
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C.3 Convert Simple J to Assembly 


;;; Xod«:CoiiDan-Lispi Packag*:ID-CONPILER; BasclO 

iiit sj-to-asm.lisp eonrarts HOP coda froa s-axprassions into format suitabla for HDPSim. 
;;;; - Contort from s-axpraaaions to strings, 

;;;; shich inclndas putting in commas and naslinas 

;;;; - Saplacing characters liha and sith to 

;;;; siaka legal HOP ident if iars. 

;;;; - Handles rafarancas, labels, and symbols. 

(in-packaga ’id-coi^ilar) 

(dafconpilar-modula contart-saxp-j-to-asm id-compilar 
(:input Tnd-instructions coda-block) ; 1 lie 

(:bafora-function procedure rasot-sj-to-asm-systam) 

; (:options tnd-output-file) 

(:funetion convart-sj-to-asm)} 

(dafmacro eat (krast args) 

‘(concatenate ’string ,CBrgs)) 

(daftar aontput-stringa) 

(daftar amsg-raf-lista) 

(daftar aip-raf-lista) 

(dafvar operand-list) 

(dafun rasat-sj-to-asm-systam () 

(satq aip-raf-lista nil) 

(satq amsg-raf-lista nil) 

(satq aoutput-stringa "")) 

(dafun maka-j-string (sym) 

(let ((s (copy-saq (my-string sym)))) 

(maka-j-string-innar s 0) 
s)) 

(dafun aiaka-j-string-innar (s index) 

(if (< index (length s)) 

(let ((e (char s index))) 

(if (or (aql c #\:) 

(aql c #\-)) 

(satf (char s index) t\_)) 

(maka-j-string-innar s (1+ index))))) 

(dafun asm-output-opeoda (opcode) 

(satq operand-list nil) 

(if opcode 

(satq aoutput-stringa (cat aoutput-stringa (format nil (string opcode)))) 

(satq aoutput-stringa (cat aoutput-stringa (format nil ""%"))))) 

(dafun asm-output-label (1) 

(asm-output-opcoda nil) 

(asm-output-oparand (cat (maka-j-string (second (third 1))) ":")) 

(asm-output-and-lina)) 

(dafun asm-output-align () 

(asm-output-opcoda nil) 

(asm-output-oparand ”:") 

(asm-output-and-lina)) 

(dafun asm-output-conmant (text) 
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(s^tq ^ontput-strin^* (eat ^output-string* (format nil "*X*X»*S" text)))) 


(dafun asm-output-operand (operand) 

(setq operand-list (nconc operand-list (list operand)))) 

(defun asm-output-end-line () 

(asm-output-end-line-inner (length operand-list) operand-list)) 

(defun asm-output-end-line-inner (len ops) 

(if (> len 0) 

(progn 

(setq *output-string* (cat *output-string* (first ops))) 

(if (> len 1) 

(setq *output-string* (cat *output-string* '*» "))) 

(asm-output-end-line-inner (- len 1) (cdr ops))))) 

(deftar *current-frame-descriptor*) 

(defun convert-sj-to-asm (cb) 

(let ((name (dataflos-graph-get eb :procedure-name)) 

(instructions (dataflou-graph-root-set cb))) 

; Yuck: Do this right. On second thought> don’t bother. 

(setq *current-frame-descriptor* (dataflov-graph-get cb tframe-descriptor)) 

(mapc f’conrert-sj-instruction-to-asm instructions) 

(let ((filename (open (make-pathname :t 7 pe "MDP" 

:defaults (cat *'o:>ellen8>“ (string name))) 
:direction :output))) 

(princ *output-string*) 

; Output the module 

(format filename "module *a'X" name) 

(princ *output-8tring* filename) 

(format filename "*Xend*X'*) 

; Output the references 

(loop for ref in (set-difference amsg-ref-list* ’(local.moTr local.getc)) 

doing (format filename "ref *a.msg_ref » MSG: (((*a+*a.loc)<<'d))+2*X“ 

(make-j-string ref) 

(make-j-string ref) 
name 

*sys-len-bits*)) 

(loop for label in aip-ref-list* 

doing (format filename "ref 'a.ip.ref = IP: (((*a+*a.loc)«*d))+ABSOLUTE*X" 
(make-j-string label) 

(make-j-string label) 
name 

*sys-len-bits*)) 

; Bogus for loops 

(fonaat filename "ref *a.codeblock_ref « CB: (*a.loc«16)‘t‘*’D'*X'* 

(dataflov-graph-get cb :procedure-name) 

(dataflov-graph-get cb :procedure-name) 

(frame-descriptor-next-available-scratch-slot ecurrent-frame-descriptor*)) 
(close filename)))) 

(defun convert-sj-instruction-to-asm (instruction) 

(let ((operator (car instruction))) 

(eond ((eq operator ’label) ; special cases 

(asm-output-label (cadr instruction))) 

((eq operator ’align) 

(asm-output-align)) 

((eq operator ’hybrid-instruction) 

(begin-hybrid-instruction-conrersion (cdr instruction))) 

(t 

(asm-output-opcode operator) 

(mapc t’convert-sj-operand-to-asm (cdr instruction)) 

(asm-output-end-1ine))))) 
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(dalnn bcgin-hybrid-instrnetion-conTcrsion (t«xt) 

(asm-output-coiiiiiiant t«xt)) 

(dafun eonT«rt-sj-op«ruul-to-asm (operand) 

(aam-ontpnt-oparand 
(caaa (car operand) 

((:tagged-literal) (ontpnt-tagged-literal operand)) 

((:j-register) (mp-atring (second operand))) 

((:fraiM) (format nil ''['S,A3]" (cadadr operand))) 

((:massage) (forawt nil '*[*S,A3]" (cadadr operand))) 

((itemporary) (format nil "C'S,AO]'' (cadadr operand))) 

))) 

(defnn ontpnt-tagged-literal (operand) 

(let ((tag (second operand))) 

(if (eq tag special-tag) 

; Everything as REFs not labels (labels vonld be more appropriate for branches) 

(cond ((eq (car (third operand)) :code-block) 

; It goes vithont saying that the code-block ref sill be outpnt 
(cat (make-j-string (second (third operand))) "_codeblock_ref}")) 

((eq (car (third operand)) :ref) 

(setq *msg-raf-list* (remove-dnplicates (cons (second (third operand)) 

*msg-ref-list*))) 

(cat (make-j-string (second (third operand))) “_msg_raf}'')) 

((eq (car (third operand)) :label) 

(setq eip-ref-list* (remove-dnplicates (cons (second (third operand)) *ip-ref-list*))) 
(cat (make-j-string (second (third operand))) '‘.ip.ref}'')) 

(t 

(break))) 

(cond ((eq tag int-tag) 

(format nil “'D" (third operand))) 

((and (eq tag boolean-tag) (numberp (third operand))) 

(if (s 0 (third operand)) 

"false" 

"trne")) 

(t 

(format nil "'A:*D" (string tag) (third operand))))))) 

(defnn my-string (x) 

(if (nnmberp x) 

(format nil "'D" x) 

(string x))) 
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